silma-embeddding-matryoshka-v0.1

Maintained By
silma-ai

SILMA Arabic Matryoshka Embedding Model

PropertyValue
Parameter Count135M
LicenseApache 2.0
LanguagesArabic, English
FrameworkSentence Transformers
Base Modelaubmindlab/bert-base-arabertv02

What is silma-embeddding-matryoshka-v0.1?

SILMA Arabic Matryoshka Embedding Model is an innovative text embedding solution that implements the Matryoshka technique, allowing for flexible dimension reduction from 768 to as low as 8 dimensions while maintaining semantic similarity capabilities. Built on BERT architecture, it's specifically optimized for Arabic and English text processing, offering state-of-the-art performance in various NLP tasks.

Implementation Details

The model was trained on a curated dataset of 2.25M triplets, utilizing the Sentence-BERT architecture with advanced pooling strategies. It employs a sophisticated training approach with adamw_torch_fused optimizer and cosine similarity loss functions.

  • Flexible dimensionality (8-768D) with maintained performance
  • Optimized for both Arabic and English text
  • Trained with batch size of 250 and learning rate of 1e-05
  • Implements advanced pooling mechanisms

Core Capabilities

  • Semantic similarity scoring
  • Cross-lingual text matching
  • Intent classification
  • Question-answer matching
  • Document similarity analysis

Frequently Asked Questions

Q: What makes this model unique?

The model's Matryoshka architecture allows for dynamic dimension reduction while maintaining semantic understanding, making it highly efficient for various deployment scenarios. It excels particularly in Arabic language processing while maintaining strong cross-lingual capabilities.

Q: What are the recommended use cases?

The model is ideal for semantic search, document classification, intent detection, and cross-lingual text matching. It performs particularly well in scenarios requiring flexible resource usage, as dimensions can be adjusted based on performance requirements.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.