NuExtract-1.5-tiny

numind

NuExtract-1.5-tiny is a fine-tuned 0.5B parameter model based on Qwen2.5, specialized for structured information extraction with multi-language support and long document handling.

Property	Value
Base Model	Qwen2.5-0.5B
Languages	English, French, Spanish, German, Portuguese, Italian
License	Apache 2.0
Hugging Face	numind/NuExtract-1.5-tiny

What is NuExtract-1.5-tiny?

NuExtract-1.5-tiny is a specialized language model fine-tuned for structured information extraction. Built upon Qwen2.5-0.5B, it's designed to efficiently process and extract structured data from documents while maintaining high accuracy and supporting multiple languages. The model is particularly notable for its ability to handle long documents and its focus on pure extraction tasks.

Implementation Details

The model implements a sophisticated architecture optimized for information extraction tasks. It requires input in the form of text and a JSON template that defines the structure of the information to be extracted. For optimal performance, it's recommended to use the model with a temperature setting very close to or at 0, as higher values may impact the pure extraction capability.

Supports batch processing for multiple documents
Implements sliding window attention for handling long sequences
Includes robust JSON template-based extraction
Optimized for zero-shot performance in multiple languages

Core Capabilities

Multi-language support across six major European languages
Long document processing with sliding window attention
Structured information extraction using JSON templates
High-accuracy extraction with focus on maintaining original text
Batch processing support for efficient document handling

Frequently Asked Questions

Q: What makes this model unique?

NuExtract-1.5-tiny stands out for its specialized focus on structured information extraction while maintaining efficiency with a relatively small parameter count. Its ability to handle multiple languages and long documents, combined with its template-based approach, makes it particularly valuable for automated data extraction tasks.

Q: What are the recommended use cases?

The model is ideal for scenarios requiring structured information extraction from documents, such as processing business documents, academic papers, or any text where specific data needs to be extracted into a structured format. It's particularly useful when dealing with multilingual content or long documents that need precise information extraction.