BGE-Large-ZH-V1.5

Property	Value
License	MIT
Language	Chinese
Embedding Dimension	1024
Downloads	1.2M+

What is bge-large-zh-v1.5?

BGE-Large-ZH-V1.5 is a state-of-the-art Chinese language embedding model developed by BAAI. It's an improved version that addresses similarity distribution issues while maintaining excellent retrieval capabilities. The model generates 1024-dimensional embeddings and is specifically optimized for Chinese text similarity and retrieval tasks.

Implementation Details

The model is built on the transformer architecture and has been pre-trained using RetroMAE followed by contrastive learning on large-scale paired data. It features improved similarity distribution compared to previous versions, making it more reliable for practical applications.

Optimized for both retrieval and general similarity tasks
Supports instruction-based queries for enhanced retrieval performance
Achieves state-of-the-art performance on C-MTEB benchmark
Compatible with popular frameworks like Sentence-Transformers and Hugging Face

Core Capabilities

Text embeddings generation for Chinese content
Semantic similarity calculation
Document retrieval and ranking
Cross-encoder capabilities for precise matching
Support for both short and long text processing

Frequently Asked Questions

Q: What makes this model unique?

The v1.5 version features improved similarity distribution and enhanced retrieval capabilities without requiring instruction prompts, making it more versatile than previous versions. It achieves SOTA performance on the C-MTEB benchmark.

Q: What are the recommended use cases?

The model excels in Chinese text retrieval, semantic search, document similarity comparison, and question-answering systems. It's particularly effective when used in combination with reranking models for high-precision results.

bge-large-zh-v1.5