Conan-embedding-v1

Maintained By
TencentBAC

Conan-embedding-v1

PropertyValue
Parameters326M
LicenseCC-BY-NC 4.0
ArchitectureBERT-based
PaperarXiv:2408.15710
AuthorTencentBAC

What is Conan-embedding-v1?

Conan-embedding-v1 is a state-of-the-art Chinese text embedding model developed by Tencent BAC Group. It achieves impressive performance across multiple benchmarks with an average score of 72.62, outperforming competitors in tasks like classification, clustering, and retrieval. The model uniquely employs enhanced negative sampling techniques to generate more effective text embeddings.

Implementation Details

The model is implemented using PyTorch and follows the BERT architecture, optimized for generating text embeddings. It uses F32 tensor types and leverages Safetensors for model storage.

  • Achieves 75.03% accuracy on classification tasks
  • 66.33% performance on clustering tasks
  • 72.76% effectiveness in reranking scenarios
  • 76.67% accuracy in retrieval applications

Core Capabilities

  • Robust performance across Chinese NLP tasks
  • Specialized negative sampling methodology
  • Strong multilingual sentence embedding capabilities
  • Efficient retrieval and reranking capabilities

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its enhanced negative sampling approach, which helps it achieve superior performance across various Chinese NLP tasks. It particularly excels in sentence embedding tasks with consistent performance across classification, clustering, and retrieval benchmarks.

Q: What are the recommended use cases?

The model is particularly well-suited for Chinese text processing tasks including semantic similarity comparison, document clustering, information retrieval, and text classification. It's especially effective for applications requiring high-quality sentence embeddings in Chinese language contexts.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.