multilingual-e5-base

multilingual-e5-base

intfloat

Multilingual text embedding model supporting 94 languages, trained on 1B+ text pairs with strong performance on retrieval and similarity tasks

PropertyValue
Parameter Count278M
LicenseMIT
PaperMultilingual E5 Text Embeddings: A Technical Report
Languages Supported94 languages

What is multilingual-e5-base?

Multilingual-E5-Base is a powerful text embedding model designed for cross-lingual understanding and retrieval tasks. Built on the XLM-RoBERTa architecture, it features 12 transformer layers and produces 768-dimensional embeddings. The model was trained on over 1 billion text pairs across multiple languages and fine-tuned on diverse supervised datasets.

Implementation Details

The model follows a two-stage training process: first, contrastive pre-training with weak supervision on massive multilingual datasets including mC4, CC News, and NLLB, followed by supervised fine-tuning on high-quality datasets like MS MARCO, NQ, and multilingual retrieval datasets.

  • Architecture: 12-layer transformer with 768-dimensional embeddings
  • Training Data: 1B+ text pairs from diverse sources
  • Input Format: Requires "query:" or "passage:" prefixes for optimal performance
  • Supported Tasks: Retrieval, semantic similarity, clustering, classification

Core Capabilities

  • Strong performance on cross-lingual retrieval (70.5% MRR@10 on Mr. TyDi)
  • Effective text embeddings for 94 languages
  • State-of-the-art results on MTEB benchmark
  • Flexible integration with popular frameworks like sentence-transformers

Frequently Asked Questions

Q: What makes this model unique?

The model combines extensive multilingual pre-training with targeted supervised fine-tuning, achieving strong performance across languages while maintaining efficient architecture size.

Q: What are the recommended use cases?

The model excels at cross-lingual information retrieval, semantic similarity computation, and document clustering. It's particularly effective for applications requiring multilingual understanding.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026