colsmolvlm-v0.1

Maintained By
vidore

ColSmolVLM-v0.1

PropertyValue
LicenseApache 2.0 (backbone) / MIT (adapters)
PaperColPali: Efficient Document Retrieval with Vision Language Models
Training Data127,460 query-page pairs
ArchitectureSmolVLM with ColBERT strategy

What is colsmolvlm-v0.1?

ColSmolVLM is an innovative vision language model specifically designed for efficient document retrieval. It combines the capabilities of SmolVLM with ColBERT-style multi-vector representations, enabling sophisticated indexing and retrieval of documents based on their visual features. This version has been trained with a batch size of 128 for 3 epochs, utilizing the colpali-engine v0.3.5.

Implementation Details

The model employs advanced training techniques including bfloat16 format, low-rank adapters (LoRA) with alpha=32 and r=32, and a paged_adamw_8bit optimizer. Training was conducted on a 4 GPU setup with data parallelism, using a learning rate of 5e-4 with linear decay and 2.5% warmup steps.

  • Multi-vector representation capability
  • Flash Attention 2 support
  • Efficient document indexing and retrieval
  • Zero-shot generalization to non-English languages

Core Capabilities

  • PDF document analysis and retrieval
  • Visual feature extraction and indexing
  • Cross-modal understanding between text and images
  • Efficient batch processing of queries and images

Frequently Asked Questions

Q: What makes this model unique?

The model uniquely combines SmolVLM's vision capabilities with ColBERT's efficient retrieval strategy, creating a powerful system for document retrieval based on visual features. Its multi-vector representation approach allows for more nuanced document understanding and retrieval.

Q: What are the recommended use cases?

The model is particularly well-suited for PDF document retrieval, academic research, and document analysis tasks. It excels in scenarios requiring efficient indexing and retrieval of documents based on both visual and textual content, especially in high-resource language environments.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.