BGE Base Chinese v1.5
Property | Value |
---|---|
License | MIT |
Paper | C-Pack: Packaged Resources To Advance General Chinese Embedding |
Embedding Dimension | 768 |
Language | Chinese |
What is bge-base-zh-v1.5?
BGE Base Chinese v1.5 is a powerful text embedding model specifically designed for Chinese language processing. It's part of the BAAI General Embedding (BGE) family, representing an improved version with more balanced similarity distribution and enhanced retrieval capabilities. The model converts Chinese text into 768-dimensional dense vectors, making it ideal for various NLP tasks like semantic search, document retrieval, and text similarity analysis.
Implementation Details
The model is built on BERT architecture and trained using a combination of RetroMAE pre-training and contrastive learning. Version 1.5 specifically addresses previous similarity distribution issues and provides better performance for non-instructed queries.
- Achieves 63.13 average score on C-MTEB benchmark
- Optimized for both retrieval and general text embedding tasks
- Supports maximum sequence length of 512 tokens
- Implements efficient normalized embeddings for cosine similarity calculations
Core Capabilities
- Text-to-vector embedding generation
- Semantic similarity computation
- Document retrieval optimization
- Cross-encoder reranking support
- Zero-shot transfer to various NLP tasks
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its improved similarity distribution in v1.5 and strong performance on Chinese text tasks. It achieves competitive results while being more computationally efficient than larger models in the BGE family.
Q: What are the recommended use cases?
The model excels in document retrieval, semantic search, and text similarity tasks. It's particularly effective when used with query instructions for retrieval tasks, though v1.5 performs well even without instructions.