MonoQwen2-VL-v0.1

Property	Value
Base Model	Qwen2-VL-2B
License	Apache 2.0
Paper	MonoT5 Paper
Training Data	ColPali train set

What is MonoQwen2-VL-v0.1?

MonoQwen2-VL-v0.1 is a specialized multimodal reranker model that has been finetuned using LoRA from the Qwen2-VL-2B base model. It excels at determining the relevance between images and queries, outputting binary "True" or "False" decisions based on the MonoT5 objective. The model has demonstrated significant improvements in retrieval performance, achieving an impressive average NDCG@5 score of 90.5 across various benchmark datasets.

Implementation Details

The model employs a sophisticated architecture that processes both image and text inputs through a carefully crafted prompt structure. It uses the Transformers library and can be deployed with optional optimizations like Flash Attention 2 and bfloat16 precision.

Built on Qwen2-VL-2B architecture with LoRA fine-tuning
Implements MonoT5 objective for relevance scoring
Supports both CPU and GPU inference with configurable options
Processes image-query pairs through structured prompt templates

Core Capabilities

Binary relevance assessment between images and queries
Reranking of candidate images from first-stage retrievers
Probability scoring for relevance decisions
Significant performance improvements over base retrieval models

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its ability to provide precise relevance scoring for image-query pairs, showing substantial improvements over first-stage retrievers like DSE. Its performance boost of 4.7 points in mean NDCG@5 demonstrates its effectiveness in real-world applications.

Q: What are the recommended use cases?

The model is particularly well-suited for document retrieval systems, visual question answering applications, and any scenario requiring precise reranking of image results. It's especially effective when used in conjunction with first-stage retrievers like DSE or ColPali.

MonoQwen2-VL-v0.1

MonoQwen2-VL-v0.1

What is MonoQwen2-VL-v0.1?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models