Dmeta-embedding-zh
Property | Value |
---|---|
Model Size | 400MB |
Context Window | 1024 tokens |
Language Support | Chinese, English |
License | Apache-2.0 |
What is Dmeta-embedding-zh?
Dmeta-embedding-zh is a state-of-the-art Chinese embedding model designed for cross-domain and cross-task applications. Currently ranked second on the MTEB Chinese leaderboard, it offers exceptional performance while maintaining a compact size of 400MB. The model excels in various scenarios including search engines, Q&A systems, intelligent customer service, and LLM+RAG applications.
Implementation Details
The model leverages multiple frameworks for inference, including Sentence-Transformers, Langchain, and Huggingface Transformers. It implements advanced techniques such as large-scale weak label contrastive learning and high-quality supervised learning across diverse domains.
- Utilizes billion-level weakly supervised text pair data
- Incorporates 30 million supervised sentence pair samples
- Optimized specifically for retrieval tasks with hard-negative sampling
Core Capabilities
- Cross-domain generalization with superior performance
- Efficient inference with compact model size
- Extended context window of 1024 tokens
- Comprehensive support for multiple frameworks
- State-of-the-art performance on MTEB benchmark
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness stems from its combination of large-scale weak label contrastive learning, high-quality supervised learning, and specific optimization for retrieval tasks. This results in exceptional cross-domain performance while maintaining a relatively small model size.
Q: What are the recommended use cases?
The model is optimized for search engines, question-answering systems, intelligent customer service, and LLM+RAG applications. It particularly excels in scenarios requiring cross-domain understanding and retrieval tasks.