Granite Embedding 278M Multilingual

Property	Value
Developer	IBM Granite Embedding Team
License	Apache 2.0
Parameters	278M
Embedding Size	768
Supported Languages	12 (English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese)
Release Date	December 18th, 2024

What is granite-embedding-278m-multilingual?

Granite-embedding-278m-multilingual is a powerful multilingual embedding model developed by IBM that generates high-quality 768-dimensional vector representations of text. The model is trained using contrastive fine-tuning, knowledge distillation, and model merging techniques on a diverse dataset combining open-source materials and IBM-collected data.

Implementation Details

The model is built on an encoder-only XLM-RoBERTa-like transformer architecture with 12 layers, 12 attention heads, and a 3072 intermediate size. It processes sequences up to 512 tokens and uses GeLU activation functions.

Compatible with both SentenceTransformers and Hugging Face Transformers libraries
Trained on over 300M text pairs across multiple languages and domains
Optimized for enterprise use with Apache 2.0 license

Core Capabilities

Text similarity computation across 12 languages
Information retrieval and search applications
Cross-lingual document matching
Strong performance on MTEB benchmarks (48.2+ across languages)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its enterprise-friendly license, comprehensive multilingual support, and strong performance without using restricted datasets like MS-MARCO. It's specifically designed for production deployments with robust evaluation across multiple languages.

Q: What are the recommended use cases?

The model excels in text similarity tasks, semantic search, information retrieval, and cross-lingual document matching. It's particularly suitable for enterprise applications requiring multilingual support and reliable performance.