text2vec-base-chinese

Maintained By
shibing624

text2vec-base-chinese

PropertyValue
Parameter Count102M
LicenseApache 2.0
Authorshibing624
Base Modelhfl/chinese-macbert-base

What is text2vec-base-chinese?

text2vec-base-chinese is a powerful Chinese sentence embedding model that maps sentences to 768-dimensional dense vector space. Built using the CoSENT (Cosine Sentence) architecture, it's specifically designed for Chinese text processing tasks including semantic similarity, text matching, and information retrieval.

Implementation Details

The model is built on hfl/chinese-macbert-base and fine-tuned using a contrastive objective with cosine similarity computations. It processes input sequences up to 256 tokens and employs mean pooling for sentence embeddings. The model achieves impressive performance across various Chinese NLP benchmarks, including ATEC (31.93%), BQ (42.67%), and STS-B (79.30%).

  • Architecture: CoSENT with Transformer base and mean pooling
  • Input Processing: Supports sequences up to 128 tokens
  • Output: 768-dimensional dense vectors
  • Training Data: Fine-tuned on shibing624/nli_zh dataset

Core Capabilities

  • Semantic sentence embedding generation
  • Text similarity computation
  • Information retrieval
  • Supports multiple acceleration backends (ONNX, OpenVINO)
  • Efficient CPU and GPU inference options

Frequently Asked Questions

Q: What makes this model unique?

The model combines the CoSENT architecture with extensive Chinese language pre-training, offering state-of-the-art performance on Chinese semantic similarity tasks while maintaining efficient inference speeds (3008 QPS).

Q: What are the recommended use cases?

The model excels in Chinese sentence similarity tasks, semantic search, and text matching applications. It's particularly well-suited for applications requiring semantic understanding of Chinese text at the sentence level.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.