roberta-base-squad2-distilled

Property	Value
Parameter Count	124M
License	MIT
Training Data	SQuAD 2.0
Base Architecture	RoBERTa

What is roberta-base-squad2-distilled?

This is a distilled version of RoBERTa-base fine-tuned for question answering tasks, specifically optimized on the SQuAD 2.0 dataset. Developed by deepset, it achieves an impressive 80.86% exact match score while maintaining efficiency through knowledge distillation from a larger teacher model (roberta-large-squad2).

Implementation Details

The model was trained using 4x V100 GPUs with carefully tuned hyperparameters including a batch size of 80, 4 epochs, and a maximum sequence length of 384. The distillation process used a temperature of 1.5 and a distillation loss weight of 0.75, balancing performance and model size reduction.

Linear warmup learning rate schedule with 3e-5 base rate
Embeddings dropout probability of 0.1
Optimized for production deployment with Haystack framework support

Core Capabilities

Extractive Question Answering with 84.01% F1 score on SQuAD 2.0
Robust performance across different domains (NYT: 91.52% F1, New Wiki: 91.09% F1)
Efficient inference with reduced model size
Native integration with Haystack and Transformers libraries

Frequently Asked Questions

Q: What makes this model unique?

The model combines the powerful RoBERTa architecture with knowledge distillation to achieve near SOTA performance while being more efficient than its teacher model. It's particularly notable for maintaining high accuracy across various domains while being production-ready.

Q: What are the recommended use cases?

This model excels in extractive question answering tasks, particularly for production environments where efficiency is crucial. It's ideal for applications requiring accurate answer extraction from documents, with special strength in handling both answerable and unanswerable questions due to its SQuAD 2.0 training.