BAAI/bge-large-zh

Property	Value
Parameter Count	326M
Embedding Dimension	1024
License	MIT
Paper	C-Pack: Packaged Resources To Advance General Chinese Embedding

What is bge-large-zh?

BGE-large-zh is a state-of-the-art Chinese text embedding model developed by BAAI that excels in generating dense vector representations for Chinese text. It's specifically designed for tasks like semantic search, retrieval, and similarity computation, achieving top performance on the C-MTEB benchmark.

Implementation Details

The model utilizes a transformer-based architecture and has been trained using both RetroMAE pre-training and contrastive learning on large-scale paired data. It generates 1024-dimensional embeddings and supports sequence lengths up to 512 tokens.

Optimized for both retrieval and similarity tasks
Supports efficient inference with FP16 precision
Includes special instruction-tuning for improved retrieval performance

Core Capabilities

High-quality Chinese text embeddings generation
State-of-the-art performance on C-MTEB benchmark
Efficient similarity computation and semantic search
Cross-lingual capabilities
Support for both short and long text embedding

Frequently Asked Questions

Q: What makes this model unique?

BGE-large-zh stands out for its exceptional performance on Chinese text embedding tasks, ranking first on the C-MTEB benchmark. It's specifically optimized for retrieval tasks and includes special instruction-tuning capabilities that improve performance.

Q: What are the recommended use cases?

The model is ideal for semantic search, document retrieval, similarity calculation, and building vector databases for LLMs. It's particularly effective when used with an instruction prefix for query-passage retrieval tasks.

bge-large-zh