bge-large-zh-v1.5

Maintained By
BAAI

BGE-Large-ZH-V1.5

PropertyValue
LicenseMIT
LanguageChinese
Embedding Dimension1024
Downloads1.2M+

What is bge-large-zh-v1.5?

BGE-Large-ZH-V1.5 is a state-of-the-art Chinese language embedding model developed by BAAI. It's an improved version that addresses similarity distribution issues while maintaining excellent retrieval capabilities. The model generates 1024-dimensional embeddings and is specifically optimized for Chinese text similarity and retrieval tasks.

Implementation Details

The model is built on the transformer architecture and has been pre-trained using RetroMAE followed by contrastive learning on large-scale paired data. It features improved similarity distribution compared to previous versions, making it more reliable for practical applications.

  • Optimized for both retrieval and general similarity tasks
  • Supports instruction-based queries for enhanced retrieval performance
  • Achieves state-of-the-art performance on C-MTEB benchmark
  • Compatible with popular frameworks like Sentence-Transformers and Hugging Face

Core Capabilities

  • Text embeddings generation for Chinese content
  • Semantic similarity calculation
  • Document retrieval and ranking
  • Cross-encoder capabilities for precise matching
  • Support for both short and long text processing

Frequently Asked Questions

Q: What makes this model unique?

The v1.5 version features improved similarity distribution and enhanced retrieval capabilities without requiring instruction prompts, making it more versatile than previous versions. It achieves SOTA performance on the C-MTEB benchmark.

Q: What are the recommended use cases?

The model excels in Chinese text retrieval, semantic search, document similarity comparison, and question-answering systems. It's particularly effective when used in combination with reranking models for high-precision results.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.