e5-large

Maintained By
intfloat

E5-large Text Embedding Model

PropertyValue
Parameter Count335M
Architecture24-layer Transformer with 1024d embeddings
LicenseMIT
PaperText Embeddings by Weakly-Supervised Contrastive Pre-training

What is e5-large?

E5-large is a powerful text embedding model designed for semantic search and similarity tasks. Developed through weakly-supervised contrastive pre-training, it generates high-quality 1024-dimensional embeddings for English text. The model requires specific prefix formatting ("query:" or "passage:") and can process sequences up to 512 tokens in length.

Implementation Details

The model utilizes a 24-layer Transformer architecture and implements contrastive learning with a low temperature of 0.01 for the InfoNCE loss. It supports both PyTorch and Sentence-Transformers frameworks, making it versatile for different application scenarios.

  • Optimized for both symmetric (semantic similarity) and asymmetric (retrieval) tasks
  • Supports batch processing with automatic padding and truncation
  • Implements efficient average pooling for embedding generation
  • Achieves strong performance on BEIR and MTEB benchmarks

Core Capabilities

  • Text Retrieval and Semantic Search
  • Semantic Similarity Assessment
  • Classification and Clustering
  • Passage Ranking and Reranking
  • Cross-document Similarity Analysis

Frequently Asked Questions

Q: What makes this model unique?

E5-large's distinctive feature is its weakly-supervised contrastive pre-training approach, which enables strong performance across various text similarity tasks while maintaining efficient inference times. The model's careful handling of query and passage prefixes ensures optimal performance for different use cases.

Q: What are the recommended use cases?

The model excels in information retrieval, semantic search, and text similarity tasks. Use "query:" prefix for symmetric tasks like semantic similarity and "query:"/"passage:" prefixes for asymmetric tasks like passage retrieval. It's particularly effective for applications requiring high-quality text embeddings for search or classification.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.