roberta-base-squad2-distilled

Maintained By
deepset

roberta-base-squad2-distilled

PropertyValue
Parameter Count124M
LicenseMIT
Training DataSQuAD 2.0
Base ArchitectureRoBERTa

What is roberta-base-squad2-distilled?

This is a distilled version of RoBERTa-base fine-tuned for question answering tasks, specifically optimized on the SQuAD 2.0 dataset. Developed by deepset, it achieves an impressive 80.86% exact match score while maintaining efficiency through knowledge distillation from a larger teacher model (roberta-large-squad2).

Implementation Details

The model was trained using 4x V100 GPUs with carefully tuned hyperparameters including a batch size of 80, 4 epochs, and a maximum sequence length of 384. The distillation process used a temperature of 1.5 and a distillation loss weight of 0.75, balancing performance and model size reduction.

  • Linear warmup learning rate schedule with 3e-5 base rate
  • Embeddings dropout probability of 0.1
  • Optimized for production deployment with Haystack framework support

Core Capabilities

  • Extractive Question Answering with 84.01% F1 score on SQuAD 2.0
  • Robust performance across different domains (NYT: 91.52% F1, New Wiki: 91.09% F1)
  • Efficient inference with reduced model size
  • Native integration with Haystack and Transformers libraries

Frequently Asked Questions

Q: What makes this model unique?

The model combines the powerful RoBERTa architecture with knowledge distillation to achieve near SOTA performance while being more efficient than its teacher model. It's particularly notable for maintaining high accuracy across various domains while being production-ready.

Q: What are the recommended use cases?

This model excels in extractive question answering tasks, particularly for production environments where efficiency is crucial. It's ideal for applications requiring accurate answer extraction from documents, with special strength in handling both answerable and unanswerable questions due to its SQuAD 2.0 training.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.