gte-base

gte-base

thenlper

GTE-base is a 109M parameter text embedding model optimized for semantic similarity tasks, achieving strong MTEB benchmark performance with efficient compute requirements and 768-dimensional embeddings.

PropertyValue
Parameter Count109M
Embedding Dimension768
Max Sequence Length512 tokens
LicenseMIT
PaperarXiv:2308.03281

What is gte-base?

GTE-base (General Text Embeddings) is a medium-sized text embedding model developed by Alibaba DAMO Academy. It represents a balanced trade-off between model size and performance, achieving an impressive 62.39% average score across 56 MTEB benchmark tasks. The model is specifically designed for generating high-quality text embeddings that can be used for various natural language processing tasks.

Implementation Details

Built on the BERT architecture, GTE-base produces 768-dimensional embeddings and can process sequences up to 512 tokens in length. The model was trained using a multi-stage contrastive learning approach on a diverse dataset of relevance text pairs, enabling robust semantic understanding across different domains.

  • Efficient architecture with 109M parameters
  • Strong performance in clustering (46.2%), pair classification (84.57%), and semantic textual similarity (82.3%)
  • Optimized for both accuracy and computational efficiency

Core Capabilities

  • Information Retrieval and Document Search
  • Semantic Textual Similarity Assessment
  • Text Reranking and Classification
  • Cross-lingual Text Understanding (English-focused)
  • Efficient Text Embedding Generation

Frequently Asked Questions

Q: What makes this model unique?

GTE-base stands out for its excellent performance-to-size ratio, achieving comparable results to larger models while being more resource-efficient. It ranks highly on the MTEB leaderboard and provides a strong balance between computational requirements and embedding quality.

Q: What are the recommended use cases?

The model excels in information retrieval, semantic similarity tasks, and text classification. It's particularly well-suited for applications requiring efficient text embeddings without compromising on quality, such as search systems, recommendation engines, and document similarity analysis.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026