mxbai-colbert-large-v1

Maintained By
mixedbread-ai

mxbai-colbert-large-v1

PropertyValue
Parameter Count335M
Model TypeColBERT Reranking
LicenseApache 2.0
Tensor TypeFP16

What is mxbai-colbert-large-v1?

mxbai-colbert-large-v1 is an advanced ColBERT model developed by Mixedbread AI, designed specifically for high-performance document reranking and retrieval tasks. Built upon their sentence embedding model mxbai-embed-large-v1, it demonstrates superior performance across various benchmarks, particularly excelling in out-of-domain scenarios.

Implementation Details

The model utilizes a large-scale architecture with 335M parameters and implements FP16 precision for efficient computation. It's designed to work seamlessly with the RAGatouille library, enabling easy integration into existing search pipelines. The model shows particularly strong performance in reranking tasks, outperforming other established ColBERT models on multiple BEIR datasets.

  • Achieves state-of-the-art NDCG@10 scores across 13 BEIR datasets
  • Exceptional performance on specialized domains like TREC-COVID (81.04 NDCG@10)
  • Efficient implementation with FP16 precision
  • Simple integration through RAGatouille library

Core Capabilities

  • Document reranking with high precision
  • Zero-shot transfer to various domains
  • Strong performance in both reranking and retrieval tasks
  • Efficient processing of large document collections
  • Easy integration with existing search systems

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its consistent performance across different domains, achieving an average NDCG@10 of 50.37 across BEIR datasets, surpassing both ColBERTv2 and Jina-ColBERT-v1. It's particularly effective for specialized scientific content, as demonstrated by its superior performance on TREC-COVID and SciFact datasets.

Q: What are the recommended use cases?

This model is ideal for applications requiring high-quality document reranking, especially in academic or scientific contexts. It's particularly well-suited for search systems that need to handle diverse content types and can be effectively used as a second-stage ranker after initial retrieval with simpler methods like BM25.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.