mxbai-embed-large-v1
Property | Value |
---|---|
Author | mixedbread-ai |
License | Apache 2.0 |
Model Type | Sentence Embeddings |
MTEB Score | 64.68% (Average across 56 datasets) |
What is mxbai-embed-large-v1?
mxbai-embed-large-v1 is a cutting-edge sentence embedding model developed by Mixedbread AI that achieves state-of-the-art performance for BERT-large sized models on the MTEB benchmark. The model outperforms commercial solutions like OpenAI's text-embedding-3-large and matches the performance of models 20x its size, such as echo-mistral-7b.
Implementation Details
The model implements both Matryoshka Representation Learning (MRL) and binary quantization techniques to optimize embedding storage and performance. It supports multiple integration methods including sentence-transformers, Transformers, Transformers.js, and API access.
- Supports dimension reduction through MRL
- Implements binary and int8 quantization
- Requires specific prompt format for retrieval tasks
- Compatible with both CLS and mean pooling strategies
Core Capabilities
- Classification performance: 75.64% across 12 datasets
- Clustering capability: 46.71% across 11 datasets
- Strong performance in Pair Classification (87.2%)
- Excellent STS (Semantic Textual Similarity) score of 85.00%
- Efficient retrieval capabilities (54.39% across 15 datasets)
Frequently Asked Questions
Q: What makes this model unique?
The model combines state-of-the-art performance with efficient storage solutions through MRL and quantization, making it particularly suitable for production environments where both accuracy and resource efficiency are crucial.
Q: What are the recommended use cases?
The model excels in semantic search, document retrieval, text similarity analysis, and classification tasks. It's particularly effective when used with the specific query prompt format for retrieval tasks: "Represent this sentence for searching relevant passages:"