ColQwen2-v0.1
Property | Value |
---|---|
License | MIT |
Base Model | Qwen2-VL-2B-Instruct |
Research Paper | ColPali: Efficient Document Retrieval with Vision Language Models |
Primary Language | English |
What is colqwen2-v0.1?
ColQwen2-v0.1 is an innovative visual retrieval model that combines the power of Qwen2-VL-2B-Instruct with ColBERT strategy for efficient document indexing. This model uniquely processes both text and images, generating multi-vector representations that enable efficient document retrieval from visual features.
Implementation Details
The model is trained using LoRA adapters with alpha=32 and r=32 on transformer layers, utilizing bfloat16 format and paged_adamw_8bit optimizer. It processes dynamic image resolutions without forced resizing, supporting up to 768 image patches for optimal performance.
- Trained on 127,460 query-page pairs
- Uses data parallelism across 8 GPUs
- Learning rate of 5e-5 with linear decay
- 2.5% warmup steps
- Batch size of 32
Core Capabilities
- Dynamic image resolution processing
- Multi-vector representations for both text and images
- Efficient document indexing and retrieval
- Zero-shot generalization potential for non-English languages
Frequently Asked Questions
Q: What makes this model unique?
The model's unique combination of Qwen2-VL architecture with ColBERT strategy allows for efficient document retrieval using visual features, making it particularly effective for PDF-type documents and multi-modal content processing.
Q: What are the recommended use cases?
This model is ideal for document retrieval systems that need to process both text and images, particularly in academic and professional contexts working with PDF documents. It's especially useful for applications requiring efficient indexing and retrieval of visual content.