stsb-m-mt-es-distilbert-base-uncased

stsb-m-mt-es-distilbert-base-uncased

eduardofv

Spanish Sentence Similarity model based on DistilBERT, fine-tuned on STS Benchmark dataset, achieving 0.74 Pearson correlation for semantic analysis.

PropertyValue
Authoreduardofv
TaskSentence Similarity (Spanish)
Base ModelDistilBERT-base-uncased
PerformancePearson: 0.7451, Spearman: 0.7364

What is stsb-m-mt-es-distilbert-base-uncased?

This is a specialized Spanish language model fine-tuned for Semantic Textual Similarity (STS) tasks. Built upon the DistilBERT architecture, it was trained using the Spanish portion of the stsb_multi_mt dataset, which consists of automatically translated STSBenchmark data using deepl.com. The model represents a significant improvement over the base model, showing a performance increase from 0.29 to 0.74 Pearson correlation.

Implementation Details

The model utilizes the DistilBERT architecture and was fine-tuned specifically for semantic similarity tasks in Spanish. The training process employed a modified version of the Sentence Transformers training script, focusing on optimizing the model's ability to generate meaningful sentence embeddings for Spanish text comparison.

  • Built on DistilBERT-base-uncased architecture
  • Fine-tuned using Spanish STSBenchmark datasets
  • Implements Sentence Transformers methodology
  • Optimized for Spanish language understanding

Core Capabilities

  • Semantic similarity assessment for Spanish text
  • Generation of sentence embeddings
  • Cross-sentence comparison and analysis
  • Significant improvement over base model performance

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized focus on Spanish language semantic similarity, achieving a substantial performance improvement over the base model (0.74 vs 0.29 Pearson correlation). It's particularly valuable for Spanish-specific NLP tasks requiring semantic understanding.

Q: What are the recommended use cases?

While primarily developed as a proof-of-concept, the model is suitable for Spanish language tasks including semantic similarity analysis, text comparison, and sentence embedding generation. It's particularly useful for research and development in Spanish NLP applications.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026