colqwen2-v0.1

Maintained By
vidore

ColQwen2-v0.1

PropertyValue
LicenseMIT
Base ModelQwen2-VL-2B-Instruct
Research PaperColPali: Efficient Document Retrieval with Vision Language Models
Primary LanguageEnglish

What is colqwen2-v0.1?

ColQwen2-v0.1 is an innovative visual retrieval model that combines the power of Qwen2-VL-2B-Instruct with ColBERT strategy for efficient document indexing. This model uniquely processes both text and images, generating multi-vector representations that enable efficient document retrieval from visual features.

Implementation Details

The model is trained using LoRA adapters with alpha=32 and r=32 on transformer layers, utilizing bfloat16 format and paged_adamw_8bit optimizer. It processes dynamic image resolutions without forced resizing, supporting up to 768 image patches for optimal performance.

  • Trained on 127,460 query-page pairs
  • Uses data parallelism across 8 GPUs
  • Learning rate of 5e-5 with linear decay
  • 2.5% warmup steps
  • Batch size of 32

Core Capabilities

  • Dynamic image resolution processing
  • Multi-vector representations for both text and images
  • Efficient document indexing and retrieval
  • Zero-shot generalization potential for non-English languages

Frequently Asked Questions

Q: What makes this model unique?

The model's unique combination of Qwen2-VL architecture with ColBERT strategy allows for efficient document retrieval using visual features, making it particularly effective for PDF-type documents and multi-modal content processing.

Q: What are the recommended use cases?

This model is ideal for document retrieval systems that need to process both text and images, particularly in academic and professional contexts working with PDF documents. It's especially useful for applications requiring efficient indexing and retrieval of visual content.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.