Conan-embedding-v1
Property | Value |
---|---|
Parameters | 326M |
License | CC-BY-NC 4.0 |
Architecture | BERT-based |
Paper | arXiv:2408.15710 |
Author | TencentBAC |
What is Conan-embedding-v1?
Conan-embedding-v1 is a state-of-the-art Chinese text embedding model developed by Tencent BAC Group. It achieves impressive performance across multiple benchmarks with an average score of 72.62, outperforming competitors in tasks like classification, clustering, and retrieval. The model uniquely employs enhanced negative sampling techniques to generate more effective text embeddings.
Implementation Details
The model is implemented using PyTorch and follows the BERT architecture, optimized for generating text embeddings. It uses F32 tensor types and leverages Safetensors for model storage.
- Achieves 75.03% accuracy on classification tasks
- 66.33% performance on clustering tasks
- 72.76% effectiveness in reranking scenarios
- 76.67% accuracy in retrieval applications
Core Capabilities
- Robust performance across Chinese NLP tasks
- Specialized negative sampling methodology
- Strong multilingual sentence embedding capabilities
- Efficient retrieval and reranking capabilities
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its enhanced negative sampling approach, which helps it achieve superior performance across various Chinese NLP tasks. It particularly excels in sentence embedding tasks with consistent performance across classification, clustering, and retrieval benchmarks.
Q: What are the recommended use cases?
The model is particularly well-suited for Chinese text processing tasks including semantic similarity comparison, document clustering, information retrieval, and text classification. It's especially effective for applications requiring high-quality sentence embeddings in Chinese language contexts.