silma-embeddding-sts-v0.1

silma-embeddding-sts-v0.1

silma-ai

Bilingual Arabic-English sentence transformer model with 135M parameters, optimized for semantic textual similarity and embedding generation at 768 dimensions.

PropertyValue
Parameter Count135M
Output Dimensions768
Max Sequence Length512 tokens
LicenseApache 2.0
LanguagesArabic, English

What is silma-embeddding-sts-v0.1?

SILMA Embedding STS is a specialized sentence transformer model designed for generating high-quality semantic embeddings for both Arabic and English text. Built on the foundation of bert-base-arabertv02, this model has been fine-tuned through a two-phase process to excel at semantic textual similarity tasks.

Implementation Details

The model employs a sophisticated architecture that generates 768-dimensional dense vector representations of input text, utilizing cosine similarity for comparing embeddings. It was trained using a two-phase approach: first on a dataset of 2.25M triplets, then fine-tuned on 30k sentence pairs with similarity scores.

  • Base Architecture: bert-base-arabertv02
  • Training Framework: Sentence Transformers 3.2.0
  • Optimization: Mixed precision training (BF16)
  • Evaluation Metrics: Achieved 85.59% Spearman correlation on Arabic STS tasks

Core Capabilities

  • Bilingual semantic similarity assessment
  • Cross-lingual text comparison
  • Semantic search implementation
  • Text classification and clustering
  • Question-answer matching

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its strong performance in both Arabic and English semantic tasks, achieving particularly impressive results on Arabic STS tasks (85.58% Spearman correlation). It's specifically optimized for production use with efficient inference capabilities.

Q: What are the recommended use cases?

The model excels in applications requiring semantic understanding such as text similarity comparison, document clustering, semantic search, and intent classification. It's particularly effective for Arabic language processing while maintaining good performance for English content.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026