BAAI/bge-large-zh
Property | Value |
---|---|
Parameter Count | 326M |
Embedding Dimension | 1024 |
License | MIT |
Paper | C-Pack: Packaged Resources To Advance General Chinese Embedding |
What is bge-large-zh?
BGE-large-zh is a state-of-the-art Chinese text embedding model developed by BAAI that excels in generating dense vector representations for Chinese text. It's specifically designed for tasks like semantic search, retrieval, and similarity computation, achieving top performance on the C-MTEB benchmark.
Implementation Details
The model utilizes a transformer-based architecture and has been trained using both RetroMAE pre-training and contrastive learning on large-scale paired data. It generates 1024-dimensional embeddings and supports sequence lengths up to 512 tokens.
- Optimized for both retrieval and similarity tasks
- Supports efficient inference with FP16 precision
- Includes special instruction-tuning for improved retrieval performance
Core Capabilities
- High-quality Chinese text embeddings generation
- State-of-the-art performance on C-MTEB benchmark
- Efficient similarity computation and semantic search
- Cross-lingual capabilities
- Support for both short and long text embedding
Frequently Asked Questions
Q: What makes this model unique?
BGE-large-zh stands out for its exceptional performance on Chinese text embedding tasks, ranking first on the C-MTEB benchmark. It's specifically optimized for retrieval tasks and includes special instruction-tuning capabilities that improve performance.
Q: What are the recommended use cases?
The model is ideal for semantic search, document retrieval, similarity calculation, and building vector databases for LLMs. It's particularly effective when used with an instruction prefix for query-passage retrieval tasks.