TinyRoBERTa-SQuAD2

Property	Value
Parameter Count	81.5M
License	CC-BY-4.0
Paper	TinyBERT Paper
Task	Extractive Question Answering
Training Data	SQuAD 2.0

What is tinyroberta-squad2?

TinyRoBERTa-SQuAD2 is a distilled version of the RoBERTa base model specifically optimized for extractive question answering tasks. Using the TinyBERT approach, this model achieves comparable performance to its larger counterpart while running at twice the speed. With 81.5M parameters, it delivers an impressive 78.86% exact match score on SQuAD 2.0.

Implementation Details

The model underwent a two-stage distillation process: first performing intermediate layer distillation with RoBERTa-base as the teacher, followed by task-specific distillation using RoBERTa-large-squad2. The model was trained with a batch size of 96 over 4 epochs, using a learning rate of 3e-5 and linear warmup scheduling.

Maximum sequence length: 384 tokens
Document stride: 128
Maximum query length: 64
Distillation temperature: 1.5
Distillation loss weight: 0.75

Core Capabilities

Efficient extractive QA with 2x speed improvement
Strong performance on SQuAD 2.0 (78.86% EM, 82.04% F1)
Robust performance on domain shifts (80.30% EM on new Wikipedia data)
Handles both answerable and unanswerable questions

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for achieving near-base-model performance while being significantly faster through careful distillation. It maintains high accuracy (78.86% EM) while requiring only half the computational resources.

Q: What are the recommended use cases?

The model is ideal for production environments where speed and efficiency matter, particularly for extractive QA tasks. It's especially suitable for applications requiring real-time question answering over documents while maintaining high accuracy.