dse-qwen2-2b-mrl-v1

MrLight

A bi-encoder model that converts document screenshots into dense vectors for retrieval, supporting multiple languages and achieving 85.8 nDCG@5 on ViDoRE.

Property	Value
Base Model	Qwen2-VL-2B-Instruct
License	Apache-2.0
Languages	English, French
Research Paper	DSE Paper

What is dse-qwen2-2b-mrl-v1?

DSE-QWen2-2b-MRL-V1 is an advanced bi-encoder model specifically designed for document screenshot embedding. It represents a significant advancement in document retrieval technology, capable of processing documents in their original visual format while preserving text, images, and layout information. The model achieves impressive performance with 85.8 nDCG@5 on the ViDoRE leaderboard.

Implementation Details

The model implements a flexible architecture supporting variable representation dimensions and input image sizes. It utilizes the Qwen2 vision encoder and can be optimized for different GPU memory constraints through adjustable image resolution settings.

Supports both document screenshots and text-only inputs
Implements flash attention 2 for improved efficiency
Allows dimension adjustment for efficiency trade-offs
Trained on multiple datasets including Docmatix-IR and MSMARCO

Core Capabilities

Document screenshot embedding generation
Multi-modal document understanding
Dense vector representation for efficient retrieval
Cross-lingual document processing (English and French)
Flexible dimension and image size adaptation

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to process document screenshots while preserving their visual structure and layout information sets it apart. It can handle multiple document types including PDFs, webpages, and slides without information loss from parsing.

Q: What are the recommended use cases?

The model is ideal for document retrieval systems, particularly when dealing with mixed-format documents. It's especially useful for applications requiring visual document understanding, cross-document search, and information retrieval from various document formats.