BGE Base Chinese v1.5

Property	Value
License	MIT
Paper	C-Pack: Packaged Resources To Advance General Chinese Embedding
Embedding Dimension	768
Language	Chinese

What is bge-base-zh-v1.5?

BGE Base Chinese v1.5 is a powerful text embedding model specifically designed for Chinese language processing. It's part of the BAAI General Embedding (BGE) family, representing an improved version with more balanced similarity distribution and enhanced retrieval capabilities. The model converts Chinese text into 768-dimensional dense vectors, making it ideal for various NLP tasks like semantic search, document retrieval, and text similarity analysis.

Implementation Details

The model is built on BERT architecture and trained using a combination of RetroMAE pre-training and contrastive learning. Version 1.5 specifically addresses previous similarity distribution issues and provides better performance for non-instructed queries.

Achieves 63.13 average score on C-MTEB benchmark
Optimized for both retrieval and general text embedding tasks
Supports maximum sequence length of 512 tokens
Implements efficient normalized embeddings for cosine similarity calculations

Core Capabilities

Text-to-vector embedding generation
Semantic similarity computation
Document retrieval optimization
Cross-encoder reranking support
Zero-shot transfer to various NLP tasks

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its improved similarity distribution in v1.5 and strong performance on Chinese text tasks. It achieves competitive results while being more computationally efficient than larger models in the BGE family.

Q: What are the recommended use cases?

The model excels in document retrieval, semantic search, and text similarity tasks. It's particularly effective when used with query instructions for retrieval tasks, though v1.5 performs well even without instructions.

bge-base-zh-v1.5