dse-qwen2-2b-mrl-v1

Maintained By
MrLight

DSE-QWen2-2b-MRL-V1

PropertyValue
Base ModelQwen2-VL-2B-Instruct
LicenseApache-2.0
LanguagesEnglish, French
Research PaperDSE Paper

What is dse-qwen2-2b-mrl-v1?

DSE-QWen2-2b-MRL-V1 is an advanced bi-encoder model specifically designed for document screenshot embedding. It represents a significant advancement in document retrieval technology, capable of processing documents in their original visual format while preserving text, images, and layout information. The model achieves impressive performance with 85.8 nDCG@5 on the ViDoRE leaderboard.

Implementation Details

The model implements a flexible architecture supporting variable representation dimensions and input image sizes. It utilizes the Qwen2 vision encoder and can be optimized for different GPU memory constraints through adjustable image resolution settings.

  • Supports both document screenshots and text-only inputs
  • Implements flash attention 2 for improved efficiency
  • Allows dimension adjustment for efficiency trade-offs
  • Trained on multiple datasets including Docmatix-IR and MSMARCO

Core Capabilities

  • Document screenshot embedding generation
  • Multi-modal document understanding
  • Dense vector representation for efficient retrieval
  • Cross-lingual document processing (English and French)
  • Flexible dimension and image size adaptation

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to process document screenshots while preserving their visual structure and layout information sets it apart. It can handle multiple document types including PDFs, webpages, and slides without information loss from parsing.

Q: What are the recommended use cases?

The model is ideal for document retrieval systems, particularly when dealing with mixed-format documents. It's especially useful for applications requiring visual document understanding, cross-document search, and information retrieval from various document formats.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.