silma-embeddding-sts-v0.1

silma-embeddding-sts-v0.1

silma-ai

A 135M parameter bilingual (Arabic-English) sentence transformer model for semantic similarity, fine-tuned on cross-lingual STS tasks with strong Arabic performance (85.6% STS17 score).

PropertyValue
Parameter Count135M
Model TypeSentence Transformer
ArchitectureBERT-based (arabertv02)
LicenseApache 2.0
LanguagesArabic, English
Output Dimension768

What is silma-embeddding-sts-v0.1?

silma-embeddding-sts-v0.1 is a specialized bilingual sentence transformer model designed for semantic textual similarity tasks in Arabic and English. Built on the arabertv02 architecture, this model maps sentences and paragraphs to a 768-dimensional dense vector space, enabling various NLP tasks like semantic search, paraphrase detection, and text classification.

Implementation Details

The model underwent a two-phase training process: first fine-tuned on 2.25M triplets of Arabic/English samples, then further refined on 30k sentence pairs with similarity scores. It achieves impressive performance on Arabic semantic similarity tasks, scoring 85.6% on the STS17 Arabic benchmark.

  • Maximum sequence length: 512 tokens
  • Similarity measure: Cosine similarity
  • Training framework: Sentence-Transformers 3.2.0
  • Hardware optimization: BF16 precision support

Core Capabilities

  • Cross-lingual semantic similarity between Arabic and English texts
  • Short and long sentence comparison
  • Question-to-paragraph matching
  • Intent classification and mapping
  • Semantic search functionality

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its strong performance on Arabic language tasks while maintaining English capability, achieving 85.6% on Arabic STS17 benchmarks. Its two-phase training approach ensures robust cross-lingual understanding.

Q: What are the recommended use cases?

The model excels in bilingual applications including semantic search, content similarity matching, and intent classification. It's particularly suitable for Arabic-English cross-lingual applications and Arabic-specific NLP tasks.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026