E5-large-v2

Property	Value
Parameters	335M
Architecture	24 layers, 1024d embeddings
License	MIT
Paper	Text Embeddings by Weakly-Supervised Contrastive Pre-training

What is e5-large-v2?

E5-large-v2 is an advanced text embedding model developed through weakly-supervised contrastive pre-training. It's specifically designed to generate high-quality text embeddings for tasks like semantic search, similarity matching, and information retrieval. The model has demonstrated strong performance across various benchmarks including BEIR and MTEB.

Implementation Details

The model utilizes a transformer architecture with 24 layers and produces 1024-dimensional embeddings. It requires specific input formatting with "query:" and "passage:" prefixes depending on the use case. The model supports both PyTorch and Sentence Transformers implementations, with a maximum sequence length of 512 tokens.

Optimized for both symmetric and asymmetric similarity tasks
Supports batch processing and efficient pooling strategies
Implements temperature-scaled contrastive learning (0.01)
Includes normalized embeddings output

Core Capabilities

Text Retrieval and Semantic Search
Semantic Similarity Assessment
Classification and Clustering
Paraphrase Detection
Information Retrieval Tasks

Frequently Asked Questions

Q: What makes this model unique?

The model's key strength lies in its weakly-supervised contrastive pre-training approach and its versatility across different text embedding tasks. It achieves strong performance on various benchmarks while maintaining efficiency with its 335M parameter size.

Q: What are the recommended use cases?

The model excels in semantic search, passage retrieval, and similarity matching tasks. For asymmetric tasks like QA retrieval, use "query:" and "passage:" prefixes. For symmetric tasks like semantic similarity, use the "query:" prefix for both inputs.

e5-large-v2