GTE-Large: General Text Embeddings Model
Property | Value |
---|---|
Parameter Count | 335M |
Dimension | 1024 |
Max Sequence Length | 512 |
License | MIT |
Paper | arXiv:2308.03281 |
What is gte-large?
GTE-Large is a state-of-the-art text embedding model developed by Alibaba DAMO Academy. It represents the largest variant in the GTE family, designed specifically for generating high-quality text embeddings through multi-stage contrastive learning. The model achieves an impressive 63.13% average score on the MTEB benchmark, outperforming other popular models like E5-large-v2 and OpenAI's text-embedding-ada-002.
Implementation Details
Built on the BERT architecture, GTE-Large generates 1024-dimensional embeddings and can process sequences up to 512 tokens in length. The model is trained on a diverse corpus of relevance text pairs, enabling robust performance across various domains.
- Advanced multi-stage contrastive learning approach
- Optimized for both semantic similarity and information retrieval tasks
- Supports batch processing with optional embedding normalization
- Implements efficient average pooling for token aggregation
Core Capabilities
- Information Retrieval (52.22% MTEB score)
- Semantic Textual Similarity (83.35% MTEB score)
- Text Reranking (59.13% MTEB score)
- Clustering (46.84% MTEB score)
- Classification Tasks (73.33% MTEB score)
Frequently Asked Questions
Q: What makes this model unique?
GTE-Large combines large-scale parameter capacity with multi-stage contrastive learning, achieving superior performance while maintaining a relatively compact size compared to other large language models. It particularly excels in semantic similarity tasks and provides a good balance between model size and performance.
Q: What are the recommended use cases?
The model is particularly well-suited for text similarity comparison, document retrieval, semantic search, and content recommendation systems. It's especially effective for applications requiring high-quality text embeddings with reasonable computational requirements.