MonoQwen2-VL-v0.1
Property | Value |
---|---|
Base Model | Qwen2-VL-2B |
License | Apache 2.0 |
Paper | MonoT5 Paper |
Training Data | ColPali train set |
What is MonoQwen2-VL-v0.1?
MonoQwen2-VL-v0.1 is a specialized multimodal reranker model that has been finetuned using LoRA from the Qwen2-VL-2B base model. It excels at determining the relevance between images and queries, outputting binary "True" or "False" decisions based on the MonoT5 objective. The model has demonstrated significant improvements in retrieval performance, achieving an impressive average NDCG@5 score of 90.5 across various benchmark datasets.
Implementation Details
The model employs a sophisticated architecture that processes both image and text inputs through a carefully crafted prompt structure. It uses the Transformers library and can be deployed with optional optimizations like Flash Attention 2 and bfloat16 precision.
- Built on Qwen2-VL-2B architecture with LoRA fine-tuning
- Implements MonoT5 objective for relevance scoring
- Supports both CPU and GPU inference with configurable options
- Processes image-query pairs through structured prompt templates
Core Capabilities
- Binary relevance assessment between images and queries
- Reranking of candidate images from first-stage retrievers
- Probability scoring for relevance decisions
- Significant performance improvements over base retrieval models
Frequently Asked Questions
Q: What makes this model unique?
The model's unique strength lies in its ability to provide precise relevance scoring for image-query pairs, showing substantial improvements over first-stage retrievers like DSE. Its performance boost of 4.7 points in mean NDCG@5 demonstrates its effectiveness in real-world applications.
Q: What are the recommended use cases?
The model is particularly well-suited for document retrieval systems, visual question answering applications, and any scenario requiring precise reranking of image results. It's especially effective when used in conjunction with first-stage retrievers like DSE or ColPali.