deepset-mxbai-embed-de-large-v1
Property | Value |
---|---|
Parameter Count | 487M |
License | Apache 2.0 |
Paper | AnglE Paper |
Languages | German, English |
Performance | 51.7 NDCG@10 |
What is deepset-mxbai-embed-de-large-v1?
This is a state-of-the-art German/English embedding model developed through collaboration between Mixedbread and deepset. Built upon the multilingual-e5-large architecture and trained using the innovative AnglE loss function, it represents a significant advancement in multilingual embedding technology. The model has been fine-tuned on over 30 million pairs of high-quality German data, making it particularly effective for German language applications while maintaining strong English language capabilities.
Implementation Details
The model implements both binary quantization and Matryoshka Representation Learning (MRL), offering significant efficiency gains without substantial performance loss. Binary quantization maintains 91.8% of performance while increasing efficiency 32-fold, while MRL allows for a 25% reduction in vector size while preserving 97.5% of model performance.
- Supports prompt-based encoding with specific formats for queries and documents
- Implements advanced optimization techniques for efficiency
- Achieves superior performance in legal domain applications
Core Capabilities
- State-of-the-art performance with 51.7 NDCG@10
- Efficient binary quantization support
- Matryoshka Representation Learning compatibility
- Bilingual support for German and English
- Optimized for retrieval tasks
Frequently Asked Questions
Q: What makes this model unique?
The model combines state-of-the-art performance with advanced efficiency features like binary quantization and MRL, while specifically excelling at German language tasks. It outperforms other open-source alternatives and even matches some closed-source solutions.
Q: What are the recommended use cases?
The model is particularly well-suited for retrieval tasks, especially in German-language contexts. It has shown exceptional performance in legal applications and can be effectively used for any semantic search or document similarity task in German or English.