gte-base

Maintained By
thenlper

GTE-Base Text Embedding Model

PropertyValue
Parameter Count109M
Embedding Dimension768
Max Sequence Length512 tokens
LicenseMIT
PaperarXiv:2308.03281

What is gte-base?

GTE-base (General Text Embeddings) is a medium-sized text embedding model developed by Alibaba DAMO Academy. It represents a balanced trade-off between model size and performance, achieving an impressive 62.39% average score across 56 MTEB benchmark tasks. The model is specifically designed for generating high-quality text embeddings that can be used for various natural language processing tasks.

Implementation Details

Built on the BERT architecture, GTE-base produces 768-dimensional embeddings and can process sequences up to 512 tokens in length. The model was trained using a multi-stage contrastive learning approach on a diverse dataset of relevance text pairs, enabling robust semantic understanding across different domains.

  • Efficient architecture with 109M parameters
  • Strong performance in clustering (46.2%), pair classification (84.57%), and semantic textual similarity (82.3%)
  • Optimized for both accuracy and computational efficiency

Core Capabilities

  • Information Retrieval and Document Search
  • Semantic Textual Similarity Assessment
  • Text Reranking and Classification
  • Cross-lingual Text Understanding (English-focused)
  • Efficient Text Embedding Generation

Frequently Asked Questions

Q: What makes this model unique?

GTE-base stands out for its excellent performance-to-size ratio, achieving comparable results to larger models while being more resource-efficient. It ranks highly on the MTEB leaderboard and provides a strong balance between computational requirements and embedding quality.

Q: What are the recommended use cases?

The model excels in information retrieval, semantic similarity tasks, and text classification. It's particularly well-suited for applications requiring efficient text embeddings without compromising on quality, such as search systems, recommendation engines, and document similarity analysis.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.