gte-large

gte-large

thenlper

Advanced text embedding model with 335M parameters, achieving SOTA performance on MTEB benchmark. Specialized for semantic similarity and retrieval tasks.

PropertyValue
Parameter Count335M
Dimension1024
Max Sequence Length512
LicenseMIT
PaperarXiv:2308.03281

What is gte-large?

GTE-Large is a state-of-the-art text embedding model developed by Alibaba DAMO Academy. It represents the largest variant in the GTE family, designed specifically for generating high-quality text embeddings through multi-stage contrastive learning. The model achieves an impressive 63.13% average score on the MTEB benchmark, outperforming other popular models like E5-large-v2 and OpenAI's text-embedding-ada-002.

Implementation Details

Built on the BERT architecture, GTE-Large generates 1024-dimensional embeddings and can process sequences up to 512 tokens in length. The model is trained on a diverse corpus of relevance text pairs, enabling robust performance across various domains.

  • Advanced multi-stage contrastive learning approach
  • Optimized for both semantic similarity and information retrieval tasks
  • Supports batch processing with optional embedding normalization
  • Implements efficient average pooling for token aggregation

Core Capabilities

  • Information Retrieval (52.22% MTEB score)
  • Semantic Textual Similarity (83.35% MTEB score)
  • Text Reranking (59.13% MTEB score)
  • Clustering (46.84% MTEB score)
  • Classification Tasks (73.33% MTEB score)

Frequently Asked Questions

Q: What makes this model unique?

GTE-Large combines large-scale parameter capacity with multi-stage contrastive learning, achieving superior performance while maintaining a relatively compact size compared to other large language models. It particularly excels in semantic similarity tasks and provides a good balance between model size and performance.

Q: What are the recommended use cases?

The model is particularly well-suited for text similarity comparison, document retrieval, semantic search, and content recommendation systems. It's especially effective for applications requiring high-quality text embeddings with reasonable computational requirements.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026