bge-large-zh

Maintained By
BAAI

BAAI/bge-large-zh

PropertyValue
Parameter Count326M
Embedding Dimension1024
LicenseMIT
PaperC-Pack: Packaged Resources To Advance General Chinese Embedding

What is bge-large-zh?

BGE-large-zh is a state-of-the-art Chinese text embedding model developed by BAAI that excels in generating dense vector representations for Chinese text. It's specifically designed for tasks like semantic search, retrieval, and similarity computation, achieving top performance on the C-MTEB benchmark.

Implementation Details

The model utilizes a transformer-based architecture and has been trained using both RetroMAE pre-training and contrastive learning on large-scale paired data. It generates 1024-dimensional embeddings and supports sequence lengths up to 512 tokens.

  • Optimized for both retrieval and similarity tasks
  • Supports efficient inference with FP16 precision
  • Includes special instruction-tuning for improved retrieval performance

Core Capabilities

  • High-quality Chinese text embeddings generation
  • State-of-the-art performance on C-MTEB benchmark
  • Efficient similarity computation and semantic search
  • Cross-lingual capabilities
  • Support for both short and long text embedding

Frequently Asked Questions

Q: What makes this model unique?

BGE-large-zh stands out for its exceptional performance on Chinese text embedding tasks, ranking first on the C-MTEB benchmark. It's specifically optimized for retrieval tasks and includes special instruction-tuning capabilities that improve performance.

Q: What are the recommended use cases?

The model is ideal for semantic search, document retrieval, similarity calculation, and building vector databases for LLMs. It's particularly effective when used with an instruction prefix for query-passage retrieval tasks.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.