ColQwen2 v1.0 Merged

Property	Value
Parameter Count	2.21B
Model Type	Visual Language Model
Precision	BF16
License	MIT
Paper	ColPali: Efficient Document Retrieval with Vision Language Models

What is colqwen2-v1.0-merged?

ColQwen2 is an advanced visual retrieval model that extends Qwen2-VL-2B-Instruct with ColBERT-style multi-vector representations for efficient document indexing. This merged version combines the base model with pre-trained LoRA adapters, streamlining deployment and making it ready for immediate use or further fine-tuning.

Implementation Details

The model was trained on 127,460 query-page pairs, combining academic datasets (63%) and synthetic data from web-crawled PDF documents with VLM-generated queries (37%). Training utilized LoRA adapters with alpha=32 and r=32, implemented with 8-bit AdamW optimizer across 8 GPUs using data parallelism.

Training performed with bfloat16 precision
Learning rate of 5e-5 with linear decay
2.5% warmup steps
Batch size of 32

Core Capabilities

Efficient document retrieval using visual features
Multi-vector representation generation for both text and images
Zero-shot generalization to non-English languages
Optimized for PDF-type document processing

Frequently Asked Questions

Q: What makes this model unique?

The model uniquely combines Vision Language Model capabilities with ColBERT-style retrieval, offering efficient document indexing while maintaining the ability to process both visual and textual information effectively.

Q: What are the recommended use cases?

The model is particularly suited for document retrieval tasks, especially those involving PDF documents with mixed text and visual content. It's ideal for applications requiring efficient indexing and retrieval of document features across multiple languages.