Granite Embedding 278M Multilingual
Property | Value |
---|---|
Developer | IBM Granite Embedding Team |
License | Apache 2.0 |
Parameters | 278M |
Embedding Size | 768 |
Supported Languages | 12 (English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese) |
Release Date | December 18th, 2024 |
What is granite-embedding-278m-multilingual?
Granite-embedding-278m-multilingual is a powerful multilingual embedding model developed by IBM that generates high-quality 768-dimensional vector representations of text. The model is trained using contrastive fine-tuning, knowledge distillation, and model merging techniques on a diverse dataset combining open-source materials and IBM-collected data.
Implementation Details
The model is built on an encoder-only XLM-RoBERTa-like transformer architecture with 12 layers, 12 attention heads, and a 3072 intermediate size. It processes sequences up to 512 tokens and uses GeLU activation functions.
- Compatible with both SentenceTransformers and Hugging Face Transformers libraries
- Trained on over 300M text pairs across multiple languages and domains
- Optimized for enterprise use with Apache 2.0 license
Core Capabilities
- Text similarity computation across 12 languages
- Information retrieval and search applications
- Cross-lingual document matching
- Strong performance on MTEB benchmarks (48.2+ across languages)
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its enterprise-friendly license, comprehensive multilingual support, and strong performance without using restricted datasets like MS-MARCO. It's specifically designed for production deployments with robust evaluation across multiple languages.
Q: What are the recommended use cases?
The model excels in text similarity tasks, semantic search, information retrieval, and cross-lingual document matching. It's particularly suitable for enterprise applications requiring multilingual support and reliable performance.