jina-reranker-m0

Maintained By
jinaai

jina-reranker-m0

PropertyValue
Parameter Count2.4B
Model TypeVision Language Model
ArchitectureQwen2-VL-2B with LoRA adaptation
Context Length10,240 tokens
LicenseCC BY-NC 4.0

What is jina-reranker-m0?

jina-reranker-m0 is a cutting-edge multilingual multimodal reranker model that revolutionizes document ranking by processing both text and visual content across 29+ languages. Built on the Qwen2-VL-2B architecture, it excels at understanding and ranking documents containing mixed content types including text, figures, tables, and various layouts.

Implementation Details

The model is implemented using a decoder-only vision language model architecture, featuring Qwen2-VL-2B as its foundation. It employs LoRA techniques for fine-tuning and includes a specialized MLP head for generating ranking scores. The architecture supports dynamic image resolution processing from 56×56 pixels up to 4K resolution.

  • Base Architecture: Qwen2-VL-2B-Instruct with vision encoder and projection layer
  • Fine-tuning: Low-Rank Adaptation (LoRA) techniques
  • Training: Optimized with pairwise and listwise ranking losses
  • Token Capacity: 10,240 tokens for query + document

Core Capabilities

  • Multimodal Document Processing: Handles text, images, tables, and mixed layouts
  • Extensive Language Support: Works across 29+ languages with bidirectional capabilities
  • Dynamic Image Processing: Supports varied image resolutions with 768 × 28 × 28 patches
  • Zero-shot Performance: Effective on unseen domains without specific fine-tuning
  • Code Search Enhancement: Specialized capabilities for programming language search
  • Long Context Understanding: Processes documents up to 10K tokens

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to process both visual and textual content across multiple languages, combined with its large context window and dynamic image resolution support, sets it apart from traditional rerankers. It's particularly notable for its zero-shot domain transfer capabilities and enhanced performance in code search tasks.

Q: What are the recommended use cases?

The model excels in ranking visual documents, multilingual content reranking, technical documentation search, code repository search, and processing long documents. It's ideal for applications requiring understanding of mixed content types and cross-lingual document ranking.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.