MonoQwen2-VL-v0.1

Maintained By
lightonai

MonoQwen2-VL-v0.1

PropertyValue
Base ModelQwen2-VL-2B
LicenseApache 2.0
PaperMonoT5 Paper
Training DataColPali train set

What is MonoQwen2-VL-v0.1?

MonoQwen2-VL-v0.1 is a specialized multimodal reranker model that has been finetuned using LoRA from the Qwen2-VL-2B base model. It excels at determining the relevance between images and queries, outputting binary "True" or "False" decisions based on the MonoT5 objective. The model has demonstrated significant improvements in retrieval performance, achieving an impressive average NDCG@5 score of 90.5 across various benchmark datasets.

Implementation Details

The model employs a sophisticated architecture that processes both image and text inputs through a carefully crafted prompt structure. It uses the Transformers library and can be deployed with optional optimizations like Flash Attention 2 and bfloat16 precision.

  • Built on Qwen2-VL-2B architecture with LoRA fine-tuning
  • Implements MonoT5 objective for relevance scoring
  • Supports both CPU and GPU inference with configurable options
  • Processes image-query pairs through structured prompt templates

Core Capabilities

  • Binary relevance assessment between images and queries
  • Reranking of candidate images from first-stage retrievers
  • Probability scoring for relevance decisions
  • Significant performance improvements over base retrieval models

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its ability to provide precise relevance scoring for image-query pairs, showing substantial improvements over first-stage retrievers like DSE. Its performance boost of 4.7 points in mean NDCG@5 demonstrates its effectiveness in real-world applications.

Q: What are the recommended use cases?

The model is particularly well-suited for document retrieval systems, visual question answering applications, and any scenario requiring precise reranking of image results. It's especially effective when used in conjunction with first-stage retrievers like DSE or ColPali.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.