mdeberta-v3-base
Property | Value |
---|---|
Parameters | 86M (backbone) + 190M (embedding) |
Languages | 16 languages including English, Arabic, Chinese, etc. |
License | MIT |
Author | Microsoft |
Paper | DeBERTaV3 Paper |
What is mdeberta-v3-base?
mdeberta-v3-base is a multilingual enhancement of the DeBERTa architecture, incorporating ELECTRA-style pre-training with gradient-disentangled embedding sharing. The model features 12 layers with a hidden size of 768 and was trained on 2.5T CC100 multilingual data.
Implementation Details
This model builds upon the success of DeBERTa by implementing disentangled attention and enhanced mask decoder mechanisms. It employs a 250K token vocabulary and demonstrates superior performance in cross-lingual transfer tasks.
- Utilizes ELECTRA-style pre-training methodology
- Implements gradient-disentangled embedding sharing
- Features enhanced mask decoder for improved performance
- Supports zero-shot cross-lingual transfer
Core Capabilities
- Multilingual text processing across 16 languages
- Superior performance on XNLI benchmark (79.8% average accuracy)
- Zero-shot cross-lingual transfer learning
- Fill-mask task optimization
Frequently Asked Questions
Q: What makes this model unique?
The model combines DeBERTa's disentangled attention mechanism with ELECTRA-style pre-training, offering superior multilingual performance compared to XLM-R-base across all 16 supported languages.
Q: What are the recommended use cases?
The model excels in cross-lingual natural language understanding tasks, particularly when training on English data and transferring to other languages. It's ideal for multilingual text classification, fill-mask tasks, and general NLU applications.