KaLM-embedding-multilingual-mini-instruct-v1.5
Property | Value |
---|---|
Model Size | 494M parameters |
Base Architecture | Qwen2-0.5B |
MTEB Score | 64.94 |
C-MTEB Score | 64.13 |
Author | HIT-TMG |
What is KaLM-embedding-multilingual-mini-instruct-v1.5?
KaLM-embedding-multilingual-mini-instruct-v1.5 is an advanced multilingual embedding model that represents the latest iteration in the KaLM-Embedding series. Built upon Qwen2-0.5B, this model has been extensively trained using a combination of weakly-supervised pre-training and supervised fine-tuning approaches, achieving state-of-the-art performance in multilingual embedding tasks.
Implementation Details
The model utilizes the transformers library (requires version ≥4.37.0) and is seamlessly integrated with the sentence-transformers framework. It supports a maximum sequence length of 512 tokens and includes special instruction handling for asymmetric tasks.
- Supports normalized embeddings generation
- Handles batch processing with customizable batch sizes
- Includes instruction-based prompting for specific tasks
- Compatible with retrieval, reranking, classification, and clustering tasks
Core Capabilities
- Superior performance on MTEB (64.94) and C-MTEB (64.13) benchmarks
- Multilingual text embedding generation
- Instruction-tuned for various NLP tasks
- Efficient processing with batch support
- Normalized embedding output
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its superior training data quality and comprehensive instruction tuning, achieving better performance than comparable models like multilingual-e5-large and bge-m3 on standard benchmarks while maintaining a relatively compact size.
Q: What are the recommended use cases?
The model excels in multilingual applications including text retrieval, semantic search, document classification, and clustering. It's particularly effective when instruction-based fine-tuning is needed for specific tasks.