mdeberta-v3-base-squad2
Property | Value |
---|---|
Parameters | 278M |
License | MIT |
Paper | DeBERTa V3 Paper |
Languages | 94 languages |
Training Data | SQuAD2.0 |
What is mdeberta-v3-base-squad2?
mdeberta-v3-base-squad2 is a multilingual question-answering model built on Microsoft's DeBERTa-V3 architecture and fine-tuned on the SQuAD2.0 dataset. This model represents a significant advancement in multilingual NLP, supporting an impressive 94 languages while achieving a strong F1 score of 84% on the SQuAD2.0 dev set.
Implementation Details
The model is built upon the mDeBERTa-V3 base architecture, featuring 12 layers and a hidden size of 768. It incorporates 86M backbone parameters and uses a vocabulary of 250K tokens, introducing an additional 190M parameters in the Embedding layer. The model was trained on 2.5T CC100 data, similar to XLM-R, and fine-tuned for 3 epochs on SQuAD2.0.
- Utilizes disentangled attention and enhanced mask decoder
- Implements ELECTRA-Style pre-training with Gradient-Disentangled Embedding Sharing
- Achieves 79.66% exact match score on answerable questions
- 82.10% accuracy on no-answer questions
Core Capabilities
- Extractive Question Answering across 94 languages
- Handles both answerable and unanswerable questions
- Efficient inference with PyTorch backend
- Compatible with Hugging Face Transformers pipeline
Frequently Asked Questions
Q: What makes this model unique?
This model combines the advanced DeBERTa-V3 architecture with multilingual capabilities, making it particularly powerful for cross-lingual question answering tasks. Its training on SQuAD2.0 enables it to not only answer questions but also recognize when questions cannot be answered from the given context.
Q: What are the recommended use cases?
The model is ideal for multilingual question answering systems, chatbots, and information extraction applications. It's particularly useful for organizations requiring QA capabilities across multiple languages without deploying separate models for each language.