KaLM-embedding-multilingual-mini-instruct-v1.5

Maintained By
HIT-TMG

KaLM-embedding-multilingual-mini-instruct-v1.5

PropertyValue
Model Size494M parameters
Base ArchitectureQwen2-0.5B
MTEB Score64.94
C-MTEB Score64.13
AuthorHIT-TMG

What is KaLM-embedding-multilingual-mini-instruct-v1.5?

KaLM-embedding-multilingual-mini-instruct-v1.5 is an advanced multilingual embedding model that represents the latest iteration in the KaLM-Embedding series. Built upon Qwen2-0.5B, this model has been extensively trained using a combination of weakly-supervised pre-training and supervised fine-tuning approaches, achieving state-of-the-art performance in multilingual embedding tasks.

Implementation Details

The model utilizes the transformers library (requires version ≥4.37.0) and is seamlessly integrated with the sentence-transformers framework. It supports a maximum sequence length of 512 tokens and includes special instruction handling for asymmetric tasks.

  • Supports normalized embeddings generation
  • Handles batch processing with customizable batch sizes
  • Includes instruction-based prompting for specific tasks
  • Compatible with retrieval, reranking, classification, and clustering tasks

Core Capabilities

  • Superior performance on MTEB (64.94) and C-MTEB (64.13) benchmarks
  • Multilingual text embedding generation
  • Instruction-tuned for various NLP tasks
  • Efficient processing with batch support
  • Normalized embedding output

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its superior training data quality and comprehensive instruction tuning, achieving better performance than comparable models like multilingual-e5-large and bge-m3 on standard benchmarks while maintaining a relatively compact size.

Q: What are the recommended use cases?

The model excels in multilingual applications including text retrieval, semantic search, document classification, and clustering. It's particularly effective when instruction-based fine-tuning is needed for specific tasks.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.