multilingual-e5-large

multilingual-e5-large

intfloat

A powerful multilingual text embedding model with 560M parameters supporting 94 languages, optimized for retrieval and semantic similarity tasks. Outperforms previous models on MTEB benchmarks.

PropertyValue
Parameter Count560M
PaperMultilingual E5 Text Embeddings: A Technical Report
LicenseMIT
Languages Supported94 languages

What is multilingual-e5-large?

Multilingual-E5-Large is a state-of-the-art text embedding model that supports 94 languages and excels at tasks like semantic search, retrieval, and text similarity. Built on XLM-RoBERTa architecture with 24 layers and 1024 embedding size, it was trained through a two-stage process involving contrastive pre-training and supervised fine-tuning on diverse multilingual datasets.

Implementation Details

The model implements a sophisticated training approach combining weak supervision on 1B+ text pairs and supervised fine-tuning on high-quality datasets across multiple languages. It requires specific text prefixing ("query:" or "passage:") for optimal performance and supports integration with popular frameworks like PyTorch and Sentence Transformers.

  • Trained on massive multilingual datasets including mC4, CC News, NLLB, and Wikipedia
  • Fine-tuned on diverse tasks including MS MARCO, NQ, TriviaQA, and multilingual retrieval datasets
  • Achieves state-of-the-art performance on Mr. TyDi benchmark with 70.5% average MRR@10

Core Capabilities

  • Text embedding generation for 94 languages
  • Semantic search and information retrieval
  • Cross-lingual text similarity assessment
  • Document clustering and classification
  • Bitext mining and parallel text alignment

Frequently Asked Questions

Q: What makes this model unique?

The model combines extensive multilingual support with state-of-the-art performance across various tasks. Its two-stage training process and careful attention to prefix requirements make it particularly effective for real-world applications.

Q: What are the recommended use cases?

The model excels at cross-lingual information retrieval, semantic search, and text similarity tasks. It's particularly suitable for applications requiring multilingual understanding and can be used for clustering, classification, and parallel text mining.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026