acge_text_embedding

Maintained By
aspire

acge_text_embedding

PropertyValue
Parameter Count326M
Maximum Sequence Length1024 tokens
Embedding Dimensions1024 or 1792
PaperMatryoshka Representation Learning Paper
Model Size0.65 GB

What is acge_text_embedding?

acge_text_embedding is a sophisticated Chinese text embedding model developed by Intsig's TextIn platform. It implements Matryoshka Representation Learning to generate flexible-dimension embeddings, achieving state-of-the-art performance on the C-MTEB benchmark with a 69.07% average score across 35 different tasks.

Implementation Details

The model employs a variable-length vectorization approach, supporting embedding dimensions of 1024 or 1792. It performs optimally with a sequence length of 512 tokens and can be run with different precision types (float16, bfloat16, float32) while maintaining consistent performance.

  • Implements Matryoshka Representation Learning for flexible dimensionality
  • Supports batch processing with normalization options
  • Optimized for both CPU and GPU inference
  • Achieves strong performance across classification, clustering, and retrieval tasks

Core Capabilities

  • Text Classification (72.75% accuracy)
  • Clustering Tasks (58.7% v-measure)
  • Pair Classification (87.84% accuracy)
  • Reranking (67.99% MAP)
  • Retrieval Tasks (72.93% average performance)
  • Semantic Textual Similarity (62.09% correlation)

Frequently Asked Questions

Q: What makes this model unique?

The model's implementation of Matryoshka Representation Learning allows for flexible embedding dimensions while maintaining high performance. This makes it particularly versatile for different application requirements and computational constraints.

Q: What are the recommended use cases?

The model excels in Chinese text processing tasks including semantic search, document classification, clustering, and similarity comparison. It's particularly well-suited for applications requiring flexible embedding dimensions while maintaining high accuracy.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.