gte-large

Maintained By
thenlper

GTE-Large: General Text Embeddings Model

PropertyValue
Parameter Count335M
Dimension1024
Max Sequence Length512
LicenseMIT
PaperarXiv:2308.03281

What is gte-large?

GTE-Large is a state-of-the-art text embedding model developed by Alibaba DAMO Academy. It represents the largest variant in the GTE family, designed specifically for generating high-quality text embeddings through multi-stage contrastive learning. The model achieves an impressive 63.13% average score on the MTEB benchmark, outperforming other popular models like E5-large-v2 and OpenAI's text-embedding-ada-002.

Implementation Details

Built on the BERT architecture, GTE-Large generates 1024-dimensional embeddings and can process sequences up to 512 tokens in length. The model is trained on a diverse corpus of relevance text pairs, enabling robust performance across various domains.

  • Advanced multi-stage contrastive learning approach
  • Optimized for both semantic similarity and information retrieval tasks
  • Supports batch processing with optional embedding normalization
  • Implements efficient average pooling for token aggregation

Core Capabilities

  • Information Retrieval (52.22% MTEB score)
  • Semantic Textual Similarity (83.35% MTEB score)
  • Text Reranking (59.13% MTEB score)
  • Clustering (46.84% MTEB score)
  • Classification Tasks (73.33% MTEB score)

Frequently Asked Questions

Q: What makes this model unique?

GTE-Large combines large-scale parameter capacity with multi-stage contrastive learning, achieving superior performance while maintaining a relatively compact size compared to other large language models. It particularly excels in semantic similarity tasks and provides a good balance between model size and performance.

Q: What are the recommended use cases?

The model is particularly well-suited for text similarity comparison, document retrieval, semantic search, and content recommendation systems. It's especially effective for applications requiring high-quality text embeddings with reasonable computational requirements.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.