colqwen2-v1.0-merged

Maintained By
vidore

ColQwen2 v1.0 Merged

PropertyValue
Parameter Count2.21B
Model TypeVisual Language Model
PrecisionBF16
LicenseMIT
PaperColPali: Efficient Document Retrieval with Vision Language Models

What is colqwen2-v1.0-merged?

ColQwen2 is an advanced visual retrieval model that extends Qwen2-VL-2B-Instruct with ColBERT-style multi-vector representations for efficient document indexing. This merged version combines the base model with pre-trained LoRA adapters, streamlining deployment and making it ready for immediate use or further fine-tuning.

Implementation Details

The model was trained on 127,460 query-page pairs, combining academic datasets (63%) and synthetic data from web-crawled PDF documents with VLM-generated queries (37%). Training utilized LoRA adapters with alpha=32 and r=32, implemented with 8-bit AdamW optimizer across 8 GPUs using data parallelism.

  • Training performed with bfloat16 precision
  • Learning rate of 5e-5 with linear decay
  • 2.5% warmup steps
  • Batch size of 32

Core Capabilities

  • Efficient document retrieval using visual features
  • Multi-vector representation generation for both text and images
  • Zero-shot generalization to non-English languages
  • Optimized for PDF-type document processing

Frequently Asked Questions

Q: What makes this model unique?

The model uniquely combines Vision Language Model capabilities with ColBERT-style retrieval, offering efficient document indexing while maintaining the ability to process both visual and textual information effectively.

Q: What are the recommended use cases?

The model is particularly suited for document retrieval tasks, especially those involving PDF documents with mixed text and visual content. It's ideal for applications requiring efficient indexing and retrieval of document features across multiple languages.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.