gte-large-zh

Maintained By
thenlper

GTE-large-zh

PropertyValue
Parameters326M
Maximum Sequence Length512 tokens
Embedding Dimension1024
LicenseMIT
PaperResearch Paper

What is gte-large-zh?

GTE-large-zh is a state-of-the-art Chinese language text embedding model developed by Alibaba DAMO Academy. It's designed to generate high-quality text embeddings for Chinese language content, achieving superior performance across various NLP tasks. The model leads the CMTEB benchmark with an average score of 66.72 across 35 datasets, surpassing other popular models in the field.

Implementation Details

Built on the BERT architecture, GTE-large-zh employs multi-stage contrastive learning on a diverse corpus of relevance text pairs. The model generates 1024-dimensional embeddings and can process sequences up to 512 tokens in length.

  • Achieves 71.34% accuracy on classification tasks
  • Demonstrates 53.07% performance on clustering tasks
  • Excels in pair classification with 81.14% accuracy
  • Shows strong performance in reranking (67.42%) and retrieval (72.49%) tasks

Core Capabilities

  • Information Retrieval
  • Semantic Textual Similarity
  • Text Reranking
  • Document Classification
  • Clustering Applications

Frequently Asked Questions

Q: What makes this model unique?

GTE-large-zh stands out for its exceptional performance on the CMTEB benchmark, outperforming other models while maintaining a relatively compact size of 326M parameters. It's particularly noteworthy for achieving balanced performance across different NLP tasks while being optimized for Chinese language processing.

Q: What are the recommended use cases?

The model is ideal for applications requiring semantic understanding of Chinese text, including search systems, recommendation engines, document similarity analysis, and content classification. It's particularly effective for enterprise-scale applications requiring high-quality text embeddings.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.