acge_text_embedding

Property	Value
Parameter Count	326M
Maximum Sequence Length	1024 tokens
Embedding Dimensions	1024 or 1792
Paper	Matryoshka Representation Learning Paper
Model Size	0.65 GB

What is acge_text_embedding?

acge_text_embedding is a sophisticated Chinese text embedding model developed by Intsig's TextIn platform. It implements Matryoshka Representation Learning to generate flexible-dimension embeddings, achieving state-of-the-art performance on the C-MTEB benchmark with a 69.07% average score across 35 different tasks.

Implementation Details

The model employs a variable-length vectorization approach, supporting embedding dimensions of 1024 or 1792. It performs optimally with a sequence length of 512 tokens and can be run with different precision types (float16, bfloat16, float32) while maintaining consistent performance.

Implements Matryoshka Representation Learning for flexible dimensionality
Supports batch processing with normalization options
Optimized for both CPU and GPU inference
Achieves strong performance across classification, clustering, and retrieval tasks

Core Capabilities

Text Classification (72.75% accuracy)
Clustering Tasks (58.7% v-measure)
Pair Classification (87.84% accuracy)
Reranking (67.99% MAP)
Retrieval Tasks (72.93% average performance)
Semantic Textual Similarity (62.09% correlation)

Frequently Asked Questions

Q: What makes this model unique?

The model's implementation of Matryoshka Representation Learning allows for flexible embedding dimensions while maintaining high performance. This makes it particularly versatile for different application requirements and computational constraints.

Q: What are the recommended use cases?

The model excels in Chinese text processing tasks including semantic search, document classification, clustering, and similarity comparison. It's particularly well-suited for applications requiring flexible embedding dimensions while maintaining high accuracy.