multilingual-e5-small

multilingual-e5-small

intfloat

Multilingual text embedding model supporting 100+ languages, optimized for semantic search & retrieval. 12-layer architecture with 384-dim embeddings.

PropertyValue
Architecture12-layer transformer with 384-dim embeddings
Authorintfloat
PaperarXiv:2402.05672
Languages Supported100+ languages

What is multilingual-e5-small?

Multilingual-E5-Small is a compact yet powerful text embedding model designed for cross-lingual understanding. Initially based on microsoft/Multilingual-MiniLM-L12-H384, it has been extensively trained on a diverse collection of multilingual datasets totaling over 5 billion text pairs.

Implementation Details

The model employs a two-stage training approach: first, contrastive pre-training with weak supervision across multiple data sources including mC4, CC News, and NLLB, followed by supervised fine-tuning on specific tasks. The architecture features 12 transformer layers and produces 384-dimensional embeddings.

  • Supports text embedding generation for 100+ languages
  • Optimized for retrieval and semantic search tasks
  • Requires "query:" or "passage:" prefixes for optimal performance
  • Maximum sequence length of 512 tokens

Core Capabilities

  • Cross-lingual semantic search and retrieval
  • Document similarity analysis
  • Multilingual question answering
  • Text classification and clustering
  • Demonstrates strong performance on Mr. TyDi benchmark with 64.4% average MRR@10

Frequently Asked Questions

Q: What makes this model unique?

The model's extensive multilingual training on diverse datasets and its efficient architecture make it particularly effective for cross-lingual applications while maintaining a relatively small size.

Q: What are the recommended use cases?

The model excels in cross-lingual information retrieval, semantic search, and text similarity tasks. It's particularly useful for applications requiring multilingual understanding with limited computational resources.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026