mdeberta-v3-base

Property	Value
Parameters	86M (backbone) + 190M (embedding)
Languages	16 languages including English, Arabic, Chinese, etc.
License	MIT
Author	Microsoft
Paper	DeBERTaV3 Paper

What is mdeberta-v3-base?

mdeberta-v3-base is a multilingual enhancement of the DeBERTa architecture, incorporating ELECTRA-style pre-training with gradient-disentangled embedding sharing. The model features 12 layers with a hidden size of 768 and was trained on 2.5T CC100 multilingual data.

Implementation Details

This model builds upon the success of DeBERTa by implementing disentangled attention and enhanced mask decoder mechanisms. It employs a 250K token vocabulary and demonstrates superior performance in cross-lingual transfer tasks.

Utilizes ELECTRA-style pre-training methodology
Implements gradient-disentangled embedding sharing
Features enhanced mask decoder for improved performance
Supports zero-shot cross-lingual transfer

Core Capabilities

Multilingual text processing across 16 languages
Superior performance on XNLI benchmark (79.8% average accuracy)
Zero-shot cross-lingual transfer learning
Fill-mask task optimization

Frequently Asked Questions

Q: What makes this model unique?

The model combines DeBERTa's disentangled attention mechanism with ELECTRA-style pre-training, offering superior multilingual performance compared to XLM-R-base across all 16 supported languages.

Q: What are the recommended use cases?

The model excels in cross-lingual natural language understanding tasks, particularly when training on English data and transferring to other languages. It's ideal for multilingual text classification, fill-mask tasks, and general NLU applications.

mdeberta-v3-base

mdeberta-v3-base

What is mdeberta-v3-base?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models