ColQwen2 v1.0
Property | Value |
---|---|
License | MIT |
Base Model | Qwen2-VL-2B-Instruct |
Paper | ColPali: Efficient Document Retrieval with Vision Language Models |
Language | English |
What is colqwen2-v1.0?
ColQwen2 v1.0 is an innovative visual retrieval model that combines the power of Qwen2-VL-2B-Instruct with ColBERT strategy to efficiently index and retrieve documents based on their visual features. This version represents a significant improvement with its larger batch size training (256 instead of 32) and updated pad token implementation.
Implementation Details
The model utilizes a dynamic image resolution approach, allowing it to process images without forced resizing or aspect ratio changes. It's limited to creating a maximum of 768 image patches, striking a balance between performance and memory efficiency. Training was conducted using bfloat16 format with LoRA adapters (alpha=32, r=32) and a paged_adamw_8bit optimizer.
- Trained on 127,460 query-page pairs
- Uses low-rank adapters for transformer layers
- Implements ColBERT-style multi-vector representations
- Supports dynamic image resolutions
Core Capabilities
- Efficient document indexing from visual features
- Multi-vector representation generation
- Dynamic image resolution processing
- Zero-shot generalization potential to non-English languages
Frequently Asked Questions
Q: What makes this model unique?
The model's unique combination of Qwen2-VL architecture with ColBERT strategy, along with its ability to handle dynamic image resolutions and generate multi-vector representations, sets it apart in the document retrieval space.
Q: What are the recommended use cases?
The model is particularly well-suited for PDF-type document retrieval tasks, academic research, and applications requiring efficient visual-textual document indexing and retrieval.