NuExtract-1.5-tiny

Maintained By
numind

NuExtract-1.5-tiny

PropertyValue
Base ModelQwen2.5-0.5B
LanguagesEnglish, French, Spanish, German, Portuguese, Italian
LicenseApache 2.0
Hugging Facenumind/NuExtract-1.5-tiny

What is NuExtract-1.5-tiny?

NuExtract-1.5-tiny is a specialized language model fine-tuned for structured information extraction. Built upon Qwen2.5-0.5B, it's designed to efficiently process and extract structured data from documents while maintaining high accuracy and supporting multiple languages. The model is particularly notable for its ability to handle long documents and its focus on pure extraction tasks.

Implementation Details

The model implements a sophisticated architecture optimized for information extraction tasks. It requires input in the form of text and a JSON template that defines the structure of the information to be extracted. For optimal performance, it's recommended to use the model with a temperature setting very close to or at 0, as higher values may impact the pure extraction capability.

  • Supports batch processing for multiple documents
  • Implements sliding window attention for handling long sequences
  • Includes robust JSON template-based extraction
  • Optimized for zero-shot performance in multiple languages

Core Capabilities

  • Multi-language support across six major European languages
  • Long document processing with sliding window attention
  • Structured information extraction using JSON templates
  • High-accuracy extraction with focus on maintaining original text
  • Batch processing support for efficient document handling

Frequently Asked Questions

Q: What makes this model unique?

NuExtract-1.5-tiny stands out for its specialized focus on structured information extraction while maintaining efficiency with a relatively small parameter count. Its ability to handle multiple languages and long documents, combined with its template-based approach, makes it particularly valuable for automated data extraction tasks.

Q: What are the recommended use cases?

The model is ideal for scenarios requiring structured information extraction from documents, such as processing business documents, academic papers, or any text where specific data needs to be extracted into a structured format. It's particularly useful when dealing with multilingual content or long documents that need precise information extraction.

''
🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.