vdr-2b-multi-v1

Maintained By
llamaindex

vdr-2b-multi-v1

PropertyValue
Model TypeVisual Document Retrieval
Base ArchitectureQwen2VL
Languages SupportedItalian, Spanish, English, French, German
Model URLHuggingFace
VRAM Requirements~4.4GB (base)

What is vdr-2b-multi-v1?

vdr-2b-multi-v1 is a groundbreaking multilingual embedding model specifically designed for visual document retrieval. Built on MrLight/dse-qwen2-2b-mrl-v1, it can encode document page screenshots into dense vector representations, enabling efficient search and retrieval across multiple languages without requiring OCR or complex data extraction pipelines.

Implementation Details

The model employs bf16 tensors and utilizes Matryoshka Representation Learning, allowing for 3x vector size reduction while maintaining 98% of embedding quality. It was trained on a comprehensive dataset of 500k high-quality multilingual samples using the DSE approach with hard-mined negatives.

  • Supports 768 image patches with batch size 16 on NVIDIA T4 GPU
  • Multiple integration options: LlamaIndex, HuggingFace Transformers, SentenceTransformers
  • Smart image resizing capabilities for optimal processing

Core Capabilities

  • Cross-lingual document retrieval with high accuracy
  • OCR-free visual document understanding
  • Efficient memory usage with configurable vector dimensions
  • Superior performance across text-only, visual-only, and mixed content

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to perform cross-lingual document retrieval without OCR sets it apart, allowing users to search German documents with Italian queries while maintaining high accuracy (NDCG@5 scores above 95% across languages).

Q: What are the recommended use cases?

Ideal for multilingual document management systems, cross-lingual information retrieval, and document search applications where traditional OCR-based solutions may be impractical or inefficient.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.