Dmeta-embedding-zh

Property	Value
Model Size	400MB
Context Window	1024 tokens
Language Support	Chinese, English
License	Apache-2.0

What is Dmeta-embedding-zh?

Dmeta-embedding-zh is a state-of-the-art Chinese embedding model designed for cross-domain and cross-task applications. Currently ranked second on the MTEB Chinese leaderboard, it offers exceptional performance while maintaining a compact size of 400MB. The model excels in various scenarios including search engines, Q&A systems, intelligent customer service, and LLM+RAG applications.

Implementation Details

The model leverages multiple frameworks for inference, including Sentence-Transformers, Langchain, and Huggingface Transformers. It implements advanced techniques such as large-scale weak label contrastive learning and high-quality supervised learning across diverse domains.

Utilizes billion-level weakly supervised text pair data
Incorporates 30 million supervised sentence pair samples
Optimized specifically for retrieval tasks with hard-negative sampling

Core Capabilities

Cross-domain generalization with superior performance
Efficient inference with compact model size
Extended context window of 1024 tokens
Comprehensive support for multiple frameworks
State-of-the-art performance on MTEB benchmark

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness stems from its combination of large-scale weak label contrastive learning, high-quality supervised learning, and specific optimization for retrieval tasks. This results in exceptional cross-domain performance while maintaining a relatively small model size.

Q: What are the recommended use cases?

The model is optimized for search engines, question-answering systems, intelligent customer service, and LLM+RAG applications. It particularly excels in scenarios requiring cross-domain understanding and retrieval tasks.