roberta-large-bne-sqac

Maintained By
PlanTL-GOB-ES

roberta-large-bne-sqac

PropertyValue
DeveloperPlanTL-GOB-ES (Barcelona Supercomputing Center)
LicenseApache License 2.0
Performance82.02 F1 Score on SQAC
Training Data570GB Spanish text from BNE

What is roberta-large-bne-sqac?

roberta-large-bne-sqac is a specialized Spanish language Question Answering (QA) model built upon RoBERTa architecture. It was developed by fine-tuning the roberta-large-bne model using the Spanish Question Answering Corpus (SQAC). The base model was trained on an impressive 570GB of clean, deduplicated text from the National Library of Spain's web crawlings between 2009 and 2019.

Implementation Details

The model underwent careful fine-tuning with specific hyperparameters: batch size of 16, learning rate of 1e-5, trained for 5 epochs. The training process included checkpoint selection based on downstream task metrics, ensuring optimal performance.

  • Built on RoBERTa-large architecture
  • Fine-tuned specifically for Spanish question answering
  • Outperforms other Spanish language models including BETO, mBERT, and BERTIN
  • Trained on the largest Spanish corpus available

Core Capabilities

  • Extractive question answering in Spanish
  • Superior performance with 82.02 F1 score on SQAC test set
  • Handles complex Spanish language understanding
  • Suitable for production deployment with Apache 2.0 license

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for being trained on the largest Spanish language corpus to date (570GB) and achieving state-of-the-art performance (82.02 F1) on Spanish question answering tasks, significantly outperforming other Spanish language models.

Q: What are the recommended use cases?

The model is specifically designed for extractive question answering tasks in Spanish. It's ideal for applications requiring accurate answer extraction from given contexts, though users should be aware of potential biases from web-crawled training data.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.