vdr-2b-multi-v1
Property | Value |
---|---|
Model Type | Visual Document Retrieval |
Base Architecture | Qwen2VL |
Languages Supported | Italian, Spanish, English, French, German |
Model URL | HuggingFace |
VRAM Requirements | ~4.4GB (base) |
What is vdr-2b-multi-v1?
vdr-2b-multi-v1 is a groundbreaking multilingual embedding model specifically designed for visual document retrieval. Built on MrLight/dse-qwen2-2b-mrl-v1, it can encode document page screenshots into dense vector representations, enabling efficient search and retrieval across multiple languages without requiring OCR or complex data extraction pipelines.
Implementation Details
The model employs bf16 tensors and utilizes Matryoshka Representation Learning, allowing for 3x vector size reduction while maintaining 98% of embedding quality. It was trained on a comprehensive dataset of 500k high-quality multilingual samples using the DSE approach with hard-mined negatives.
- Supports 768 image patches with batch size 16 on NVIDIA T4 GPU
- Multiple integration options: LlamaIndex, HuggingFace Transformers, SentenceTransformers
- Smart image resizing capabilities for optimal processing
Core Capabilities
- Cross-lingual document retrieval with high accuracy
- OCR-free visual document understanding
- Efficient memory usage with configurable vector dimensions
- Superior performance across text-only, visual-only, and mixed content
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to perform cross-lingual document retrieval without OCR sets it apart, allowing users to search German documents with Italian queries while maintaining high accuracy (NDCG@5 scores above 95% across languages).
Q: What are the recommended use cases?
Ideal for multilingual document management systems, cross-lingual information retrieval, and document search applications where traditional OCR-based solutions may be impractical or inefficient.