XLM-RoBERTa Base Squad2 Distilled

Property	Value
Parameter Count	277M
License	MIT
Training Data	SQuAD 2.0
Language Support	Multilingual
Framework	PyTorch

What is xlm-roberta-base-squad2-distilled?

This is a multilingual question-answering model that leverages knowledge distillation from XLM-RoBERTa large to create a more efficient base model. Trained on SQuAD 2.0, it achieves impressive performance with 74.07% exact match and 76.40% F1 score while maintaining a smaller footprint than its teacher model.

Implementation Details

The model was trained using Haystack's distillation feature with carefully tuned hyperparameters including a batch size of 56, 4 training epochs, and a maximum sequence length of 384. It employs linear warmup scheduling with a learning rate of 3e-5 and uses temperature scaling (T=3) for distillation.

Optimized for extractive question answering across multiple languages
Implements knowledge distillation with 0.75 distillation loss weight
Trained on Tesla v100 infrastructure
Supports both Haystack and Transformers frameworks integration

Core Capabilities

Multilingual extractive question answering
Efficient inference with reduced model size
Easy integration with popular NLP frameworks
Production-ready performance

Frequently Asked Questions

Q: What makes this model unique?

The model combines the power of multilingual understanding with efficient distillation, making it particularly valuable for production environments where resource optimization is crucial while maintaining high performance across multiple languages.

Q: What are the recommended use cases?

This model is ideal for building multilingual question-answering systems, especially in production environments where efficiency is important. It's particularly well-suited for applications requiring extractive QA capabilities across different languages while maintaining reasonable resource requirements.