silma-embeddding-sts-v0.1

Maintained By
silma-ai

silma-embeddding-sts-v0.1

PropertyValue
Parameter Count135M
Model TypeSentence Transformer
ArchitectureBERT-based (arabertv02)
LicenseApache 2.0
LanguagesArabic, English
Output Dimension768

What is silma-embeddding-sts-v0.1?

silma-embeddding-sts-v0.1 is a specialized bilingual sentence transformer model designed for semantic textual similarity tasks in Arabic and English. Built on the arabertv02 architecture, this model maps sentences and paragraphs to a 768-dimensional dense vector space, enabling various NLP tasks like semantic search, paraphrase detection, and text classification.

Implementation Details

The model underwent a two-phase training process: first fine-tuned on 2.25M triplets of Arabic/English samples, then further refined on 30k sentence pairs with similarity scores. It achieves impressive performance on Arabic semantic similarity tasks, scoring 85.6% on the STS17 Arabic benchmark.

  • Maximum sequence length: 512 tokens
  • Similarity measure: Cosine similarity
  • Training framework: Sentence-Transformers 3.2.0
  • Hardware optimization: BF16 precision support

Core Capabilities

  • Cross-lingual semantic similarity between Arabic and English texts
  • Short and long sentence comparison
  • Question-to-paragraph matching
  • Intent classification and mapping
  • Semantic search functionality

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its strong performance on Arabic language tasks while maintaining English capability, achieving 85.6% on Arabic STS17 benchmarks. Its two-phase training approach ensures robust cross-lingual understanding.

Q: What are the recommended use cases?

The model excels in bilingual applications including semantic search, content similarity matching, and intent classification. It's particularly suitable for Arabic-English cross-lingual applications and Arabic-specific NLP tasks.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.