turkish-colpali

Property	Value
Base Model	vidore/colpali-v1.3-hf
Training Framework	PaliGemma-3B
Authors	Selim Çavaş & Muhammet Fatih Aktuğ
Primary Language	Turkish

What is turkish-colpali?

turkish-colpali is a specialized fine-tuned version of the ColPali model, specifically designed for Turkish document retrieval. Built upon PaliGemma-3B architecture, it implements an innovative approach that combines both visual and textual features for efficient document indexing and retrieval. The model was trained on carefully curated Turkish textbooks and science magazine content, making it particularly effective for academic and scientific document processing.

Implementation Details

The model employs a sophisticated training strategy utilizing Vision Language Models (VLMs) to generate ColBERT-style multi-vector representations. The training process involved converting PDF documents to page images and using gemini-2.0-flash-exp for synthetic query generation. The implementation supports both textual and visual retrieval capabilities, extending traditional RAG system functionalities.

Trained with learning rate of 5e-05 and linear scheduler
Uses ADAMW_TORCH optimizer with specific beta parameters
Implements batch processing with gradient accumulation
Supports bfloat16 precision for efficient computation

Core Capabilities

Dual-modal document indexing (text and visual)
Efficient retrieval of Turkish academic content
PDF document processing and analysis
Multi-vector representation generation
Support for diverse query types

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to process both visual and textual content in Turkish documents sets it apart, making it especially valuable for comprehensive document analysis. Its fine-tuning on Turkish academic content ensures high performance on educational and scientific materials.

Q: What are the recommended use cases?

The model excels in processing Turkish textbooks, scientific magazines, and well-structured PDF documents. It's particularly suitable for academic content management systems, digital libraries, and educational resource indexing systems where both visual and textual content need to be analyzed.

turkish-colpali

turkish-colpali

What is turkish-colpali?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models