bge-multilingual-gemma2

Maintained By
BAAI

BGE-Multilingual-Gemma2

PropertyValue
Parameter Count9.24B
Model TypeText Embedding
Base ModelGoogle Gemma-2-9B
LicenseGemma License
PapersBGE M3-Embedding Paper

What is bge-multilingual-gemma2?

BGE-Multilingual-Gemma2 is a state-of-the-art multilingual embedding model built on Google's Gemma-2-9B architecture. It's designed to generate high-quality text embeddings across multiple languages, making it particularly valuable for cross-lingual information retrieval and semantic search applications. The model has achieved remarkable results on various benchmarks including MIRACL, MTEB-pl, and MTEB-fr, demonstrating its exceptional multilingual capabilities.

Implementation Details

The model leverages a large-scale architecture with 9.24B parameters and implements advanced training techniques including self-knowledge distillation. It can process inputs up to 4096 tokens and outputs dense vector representations that capture semantic meaning across languages.

  • Supports multiple languages including English, Chinese, Japanese, Korean, French, and more
  • Optimized for retrieval tasks with specialized query instruction handling
  • Compatible with popular frameworks like HuggingFace Transformers and Sentence Transformers
  • Supports efficient inference with FP16 precision option

Core Capabilities

  • State-of-the-art performance on multilingual benchmarks
  • Robust semantic understanding across languages
  • Efficient retrieval and similarity matching
  • Flexible integration options with major ML frameworks
  • Support for both symmetric and asymmetric similarity tasks

Frequently Asked Questions

Q: What makes this model unique?

The model's key differentiator is its combination of massive scale (9.24B parameters) with true multilingual capabilities, achieving SOTA results across multiple language benchmarks while maintaining efficient inference capabilities.

Q: What are the recommended use cases?

The model excels in cross-lingual information retrieval, semantic search, document similarity matching, and multilingual knowledge mining. It's particularly well-suited for applications requiring high-quality semantic understanding across multiple languages.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.