bge-large-zh

bge-large-zh

BAAI

Large-scale Chinese text embedding model (326M params) optimized for retrieval and similarity tasks, achieving SOTA performance on C-MTEB benchmark with 1024d embeddings.

PropertyValue
Parameter Count326M
Embedding Dimension1024
LicenseMIT
PaperC-Pack: Packaged Resources To Advance General Chinese Embedding

What is bge-large-zh?

BGE-large-zh is a state-of-the-art Chinese text embedding model developed by BAAI that excels in generating dense vector representations for Chinese text. It's specifically designed for tasks like semantic search, retrieval, and similarity computation, achieving top performance on the C-MTEB benchmark.

Implementation Details

The model utilizes a transformer-based architecture and has been trained using both RetroMAE pre-training and contrastive learning on large-scale paired data. It generates 1024-dimensional embeddings and supports sequence lengths up to 512 tokens.

  • Optimized for both retrieval and similarity tasks
  • Supports efficient inference with FP16 precision
  • Includes special instruction-tuning for improved retrieval performance

Core Capabilities

  • High-quality Chinese text embeddings generation
  • State-of-the-art performance on C-MTEB benchmark
  • Efficient similarity computation and semantic search
  • Cross-lingual capabilities
  • Support for both short and long text embedding

Frequently Asked Questions

Q: What makes this model unique?

BGE-large-zh stands out for its exceptional performance on Chinese text embedding tasks, ranking first on the C-MTEB benchmark. It's specifically optimized for retrieval tasks and includes special instruction-tuning capabilities that improve performance.

Q: What are the recommended use cases?

The model is ideal for semantic search, document retrieval, similarity calculation, and building vector databases for LLMs. It's particularly effective when used with an instruction prefix for query-passage retrieval tasks.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026