mdeberta-v3-base-squad2

Property	Value
Parameters	278M
License	MIT
Paper	DeBERTa V3 Paper
Languages	94 languages
Training Data	SQuAD2.0

What is mdeberta-v3-base-squad2?

mdeberta-v3-base-squad2 is a multilingual question-answering model built on Microsoft's DeBERTa-V3 architecture and fine-tuned on the SQuAD2.0 dataset. This model represents a significant advancement in multilingual NLP, supporting an impressive 94 languages while achieving a strong F1 score of 84% on the SQuAD2.0 dev set.

Implementation Details

The model is built upon the mDeBERTa-V3 base architecture, featuring 12 layers and a hidden size of 768. It incorporates 86M backbone parameters and uses a vocabulary of 250K tokens, introducing an additional 190M parameters in the Embedding layer. The model was trained on 2.5T CC100 data, similar to XLM-R, and fine-tuned for 3 epochs on SQuAD2.0.

Utilizes disentangled attention and enhanced mask decoder
Implements ELECTRA-Style pre-training with Gradient-Disentangled Embedding Sharing
Achieves 79.66% exact match score on answerable questions
82.10% accuracy on no-answer questions

Core Capabilities

Extractive Question Answering across 94 languages
Handles both answerable and unanswerable questions
Efficient inference with PyTorch backend
Compatible with Hugging Face Transformers pipeline

Frequently Asked Questions

Q: What makes this model unique?

This model combines the advanced DeBERTa-V3 architecture with multilingual capabilities, making it particularly powerful for cross-lingual question answering tasks. Its training on SQuAD2.0 enables it to not only answer questions but also recognize when questions cannot be answered from the given context.

Q: What are the recommended use cases?

The model is ideal for multilingual question answering systems, chatbots, and information extraction applications. It's particularly useful for organizations requiring QA capabilities across multiple languages without deploying separate models for each language.