acge_text_embedding
Property | Value |
---|---|
Parameter Count | 326M |
Maximum Sequence Length | 1024 tokens |
Embedding Dimensions | 1024 or 1792 |
Paper | Matryoshka Representation Learning Paper |
Model Size | 0.65 GB |
What is acge_text_embedding?
acge_text_embedding is a sophisticated Chinese text embedding model developed by Intsig's TextIn platform. It implements Matryoshka Representation Learning to generate flexible-dimension embeddings, achieving state-of-the-art performance on the C-MTEB benchmark with a 69.07% average score across 35 different tasks.
Implementation Details
The model employs a variable-length vectorization approach, supporting embedding dimensions of 1024 or 1792. It performs optimally with a sequence length of 512 tokens and can be run with different precision types (float16, bfloat16, float32) while maintaining consistent performance.
- Implements Matryoshka Representation Learning for flexible dimensionality
- Supports batch processing with normalization options
- Optimized for both CPU and GPU inference
- Achieves strong performance across classification, clustering, and retrieval tasks
Core Capabilities
- Text Classification (72.75% accuracy)
- Clustering Tasks (58.7% v-measure)
- Pair Classification (87.84% accuracy)
- Reranking (67.99% MAP)
- Retrieval Tasks (72.93% average performance)
- Semantic Textual Similarity (62.09% correlation)
Frequently Asked Questions
Q: What makes this model unique?
The model's implementation of Matryoshka Representation Learning allows for flexible embedding dimensions while maintaining high performance. This makes it particularly versatile for different application requirements and computational constraints.
Q: What are the recommended use cases?
The model excels in Chinese text processing tasks including semantic search, document classification, clustering, and similarity comparison. It's particularly well-suited for applications requiring flexible embedding dimensions while maintaining high accuracy.