ColQwen2-v0.1

Property	Value
License	MIT
Base Model	Qwen2-VL-2B-Instruct
Research Paper	ColPali: Efficient Document Retrieval with Vision Language Models
Primary Language	English

What is colqwen2-v0.1?

ColQwen2-v0.1 is an innovative visual retrieval model that combines the power of Qwen2-VL-2B-Instruct with ColBERT strategy for efficient document indexing. This model uniquely processes both text and images, generating multi-vector representations that enable efficient document retrieval from visual features.

Implementation Details

The model is trained using LoRA adapters with alpha=32 and r=32 on transformer layers, utilizing bfloat16 format and paged_adamw_8bit optimizer. It processes dynamic image resolutions without forced resizing, supporting up to 768 image patches for optimal performance.

Trained on 127,460 query-page pairs
Uses data parallelism across 8 GPUs
Learning rate of 5e-5 with linear decay
2.5% warmup steps
Batch size of 32

Core Capabilities

Dynamic image resolution processing
Multi-vector representations for both text and images
Efficient document indexing and retrieval
Zero-shot generalization potential for non-English languages

Frequently Asked Questions

Q: What makes this model unique?

The model's unique combination of Qwen2-VL architecture with ColBERT strategy allows for efficient document retrieval using visual features, making it particularly effective for PDF-type documents and multi-modal content processing.

Q: What are the recommended use cases?

This model is ideal for document retrieval systems that need to process both text and images, particularly in academic and professional contexts working with PDF documents. It's especially useful for applications requiring efficient indexing and retrieval of visual content.

colqwen2-v0.1