XLM-RoBERTa Large SQuAD2
Property | Value |
---|---|
Base Architecture | XLM-RoBERTa Large |
Task | Extractive Question Answering |
Training Data | SQuAD 2.0 |
Languages | Multilingual |
Author | deepset |
Model URL | deepset/xlm-roberta-large-squad2 |
What is xlm-roberta-large-squad2?
XLM-RoBERTa Large SQuAD2 is a multilingual question answering model built on the XLM-RoBERTa large architecture and fine-tuned on the SQuAD 2.0 dataset. The model excels at extractive QA tasks across multiple languages, demonstrating impressive performance metrics including 83.79% F1 score on the English SQuAD 2.0 dev set and strong results on German MLQA and XQuAD datasets.
Implementation Details
The model was trained with carefully selected hyperparameters including a batch size of 32, 3 epochs, and a maximum sequence length of 256. It uses a linear warmup learning rate schedule with a warmup proportion of 0.2 and a base learning rate of 1e-5. The training infrastructure utilized 4 Tesla V100 GPUs for optimal performance.
- Maximum query length: 64 tokens
- Document stride: 128 tokens
- Base model: xlm-roberta-large
- Integration support for both Haystack and Transformers libraries
Core Capabilities
- Multilingual extractive question answering
- High performance on English QA (79.46% exact match, 83.79% F1 score)
- Strong German language support (61.51% exact match on XQuAD)
- No-answer detection capability
- Scalable document processing
Frequently Asked Questions
Q: What makes this model unique?
This model combines the powerful multilingual capabilities of XLM-RoBERTa with sophisticated question answering abilities, making it especially valuable for organizations requiring multilingual QA solutions. Its strong performance across different languages and ability to handle no-answer scenarios makes it particularly versatile.
Q: What are the recommended use cases?
The model is ideal for building multilingual question answering systems, document search applications, and information extraction tools. It's particularly well-suited for applications requiring cross-lingual capabilities and can be efficiently integrated into production systems using frameworks like Haystack.