roberta-base-bne-finetuned-msmarco-qa-es-mnrl-mn
Property | Value |
---|---|
Base Model | PlanTL-GOB-ES/roberta-base-bne |
Embedding Dimension | 768 |
Max Sequence Length | 512 |
Training Dataset | IIC/ms_marco_es |
License | Apache License 2.0 |
What is roberta-base-bne-finetuned-msmarco-qa-es-mnrl-mn?
This is a specialized Spanish language sentence transformer model designed for question-answering and semantic search tasks. Built upon RoBERTa-BNE, it has been fine-tuned using the MS-MARCO dataset translated to Spanish, employing Multiple Negative Ranking Loss (MNRL) training strategy.
Implementation Details
The model transforms Spanish text into 768-dimensional dense vector representations, trained with specific hyperparameters including a learning rate of 2e-05, batch size of 16, and 10 epochs. The training process involved 481,335 samples from the translated MS-MARCO dataset.
- Utilizes sentence-transformers framework for easy implementation
- Implements Multiple Negatives Ranking Loss for effective semantic matching
- Supports maximum sequence length of 512 tokens
- Optimized for Spanish language question-answering tasks
Core Capabilities
- Semantic search and text similarity comparison in Spanish
- Question-answer matching and retrieval
- Text embedding generation for downstream tasks
- Efficient corpus searching and document retrieval
Frequently Asked Questions
Q: What makes this model unique?
This model combines the power of RoBERTa-BNE with specialized training on Spanish question-answering tasks, making it particularly effective for Spanish language information retrieval and semantic search applications.
Q: What are the recommended use cases?
The model excels in Spanish language applications requiring semantic search, question-answering systems, document similarity comparisons, and information retrieval tasks. It's particularly useful for applications needing to understand and match question-answer pairs in Spanish.