Spanish DistilBERT for Question Answering

Property	Value
License	Apache 2.0
Author	Manuel Romero (mrm8488)
Training Dataset	SQuAD2.0-es (111K Q&A pairs)
Framework	PyTorch, Transformers

What is distill-bert-base-spanish-wwm-cased-finetuned-spa-squad2-es?

This model represents a distilled version of BETO (Spanish BERT) specifically fine-tuned for question-answering tasks. Through knowledge distillation using bert-base-multilingual-cased as the teacher model, it achieves faster performance while maintaining strong capabilities on Spanish language Q&A tasks.

Implementation Details

The model was trained using Tesla P100 GPU with 25GB RAM, implementing knowledge distillation techniques to create a more efficient version of the original BETO model. The training process involved 5 epochs with a learning rate of 3e-5 and a maximum sequence length of 384 tokens.

Utilizes whole word masking (WWM) approach
Implements SQuAD2.0 Spanish dataset with 111K training examples
Features optimized document stride of 128 tokens
Supports batch size of 12 during training

Core Capabilities

Efficient Spanish language question answering
Handles unanswerable questions (SQuAD2.0 style)
Faster inference compared to full BERT model
Compatible with Hugging Face pipelines

Frequently Asked Questions

Q: What makes this model unique?

This model combines the efficiency of knowledge distillation with Spanish language expertise, making it significantly faster and lighter than its teacher model while maintaining strong performance on Q&A tasks.

Q: What are the recommended use cases?

The model is ideal for Spanish language question answering systems, chatbots, and information extraction tasks where efficiency and performance are crucial. It's particularly well-suited for applications with resource constraints.

distill-bert-base-spanish-wwm-cased-finetuned-spa-squad2-es