bge-multilingual-gemma2

bge-multilingual-gemma2

BAAI

A powerful multilingual embedding model (9.24B params) based on Gemma-2 that achieves SOTA results across multiple languages and benchmarks for text embedding tasks

PropertyValue
Parameter Count9.24B
Model TypeText Embedding
Base ModelGoogle Gemma-2-9B
LicenseGemma License
PapersBGE M3-Embedding Paper

What is bge-multilingual-gemma2?

BGE-Multilingual-Gemma2 is a state-of-the-art multilingual embedding model built on Google's Gemma-2-9B architecture. It's designed to generate high-quality text embeddings across multiple languages, making it particularly valuable for cross-lingual information retrieval and semantic search applications. The model has achieved remarkable results on various benchmarks including MIRACL, MTEB-pl, and MTEB-fr, demonstrating its exceptional multilingual capabilities.

Implementation Details

The model leverages a large-scale architecture with 9.24B parameters and implements advanced training techniques including self-knowledge distillation. It can process inputs up to 4096 tokens and outputs dense vector representations that capture semantic meaning across languages.

  • Supports multiple languages including English, Chinese, Japanese, Korean, French, and more
  • Optimized for retrieval tasks with specialized query instruction handling
  • Compatible with popular frameworks like HuggingFace Transformers and Sentence Transformers
  • Supports efficient inference with FP16 precision option

Core Capabilities

  • State-of-the-art performance on multilingual benchmarks
  • Robust semantic understanding across languages
  • Efficient retrieval and similarity matching
  • Flexible integration options with major ML frameworks
  • Support for both symmetric and asymmetric similarity tasks

Frequently Asked Questions

Q: What makes this model unique?

The model's key differentiator is its combination of massive scale (9.24B parameters) with true multilingual capabilities, achieving SOTA results across multiple language benchmarks while maintaining efficient inference capabilities.

Q: What are the recommended use cases?

The model excels in cross-lingual information retrieval, semantic search, document similarity matching, and multilingual knowledge mining. It's particularly well-suited for applications requiring high-quality semantic understanding across multiple languages.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026