ColQwen2 v1.0

Property	Value
License	MIT
Base Model	Qwen2-VL-2B-Instruct
Paper	ColPali: Efficient Document Retrieval with Vision Language Models
Language	English

What is colqwen2-v1.0?

ColQwen2 v1.0 is an innovative visual retrieval model that combines the power of Qwen2-VL-2B-Instruct with ColBERT strategy to efficiently index and retrieve documents based on their visual features. This version represents a significant improvement with its larger batch size training (256 instead of 32) and updated pad token implementation.

Implementation Details

The model utilizes a dynamic image resolution approach, allowing it to process images without forced resizing or aspect ratio changes. It's limited to creating a maximum of 768 image patches, striking a balance between performance and memory efficiency. Training was conducted using bfloat16 format with LoRA adapters (alpha=32, r=32) and a paged_adamw_8bit optimizer.

Trained on 127,460 query-page pairs
Uses low-rank adapters for transformer layers
Implements ColBERT-style multi-vector representations
Supports dynamic image resolutions

Core Capabilities

Efficient document indexing from visual features
Multi-vector representation generation
Dynamic image resolution processing
Zero-shot generalization potential to non-English languages

Frequently Asked Questions

Q: What makes this model unique?

The model's unique combination of Qwen2-VL architecture with ColBERT strategy, along with its ability to handle dynamic image resolutions and generate multi-vector representations, sets it apart in the document retrieval space.

Q: What are the recommended use cases?

The model is particularly well-suited for PDF-type document retrieval tasks, academic research, and applications requiring efficient visual-textual document indexing and retrieval.

colqwen2-v1.0