jina-reranker-m0
Property | Value |
---|---|
Parameter Count | 2.4B |
Model Type | Vision Language Model |
Architecture | Qwen2-VL-2B with LoRA adaptation |
Context Length | 10,240 tokens |
License | CC BY-NC 4.0 |
What is jina-reranker-m0?
jina-reranker-m0 is a cutting-edge multilingual multimodal reranker model that revolutionizes document ranking by processing both text and visual content across 29+ languages. Built on the Qwen2-VL-2B architecture, it excels at understanding and ranking documents containing mixed content types including text, figures, tables, and various layouts.
Implementation Details
The model is implemented using a decoder-only vision language model architecture, featuring Qwen2-VL-2B as its foundation. It employs LoRA techniques for fine-tuning and includes a specialized MLP head for generating ranking scores. The architecture supports dynamic image resolution processing from 56×56 pixels up to 4K resolution.
- Base Architecture: Qwen2-VL-2B-Instruct with vision encoder and projection layer
- Fine-tuning: Low-Rank Adaptation (LoRA) techniques
- Training: Optimized with pairwise and listwise ranking losses
- Token Capacity: 10,240 tokens for query + document
Core Capabilities
- Multimodal Document Processing: Handles text, images, tables, and mixed layouts
- Extensive Language Support: Works across 29+ languages with bidirectional capabilities
- Dynamic Image Processing: Supports varied image resolutions with 768 × 28 × 28 patches
- Zero-shot Performance: Effective on unseen domains without specific fine-tuning
- Code Search Enhancement: Specialized capabilities for programming language search
- Long Context Understanding: Processes documents up to 10K tokens
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to process both visual and textual content across multiple languages, combined with its large context window and dynamic image resolution support, sets it apart from traditional rerankers. It's particularly notable for its zero-shot domain transfer capabilities and enhanced performance in code search tasks.
Q: What are the recommended use cases?
The model excels in ranking visual documents, multilingual content reranking, technical documentation search, code repository search, and processing long documents. It's ideal for applications requiring understanding of mixed content types and cross-lingual document ranking.