mdeberta-v3-base

mdeberta-v3-base

microsoft

Multilingual DeBERTa-V3 base model with 86M parameters, supporting 16 languages. Enhanced BERT architecture with disentangled attention and ELECTRA-style pre-training.

PropertyValue
Parameters86M (backbone) + 190M (embedding)
Languages16 languages including English, Arabic, Chinese, etc.
LicenseMIT
AuthorMicrosoft
PaperDeBERTaV3 Paper

What is mdeberta-v3-base?

mdeberta-v3-base is a multilingual enhancement of the DeBERTa architecture, incorporating ELECTRA-style pre-training with gradient-disentangled embedding sharing. The model features 12 layers with a hidden size of 768 and was trained on 2.5T CC100 multilingual data.

Implementation Details

This model builds upon the success of DeBERTa by implementing disentangled attention and enhanced mask decoder mechanisms. It employs a 250K token vocabulary and demonstrates superior performance in cross-lingual transfer tasks.

  • Utilizes ELECTRA-style pre-training methodology
  • Implements gradient-disentangled embedding sharing
  • Features enhanced mask decoder for improved performance
  • Supports zero-shot cross-lingual transfer

Core Capabilities

  • Multilingual text processing across 16 languages
  • Superior performance on XNLI benchmark (79.8% average accuracy)
  • Zero-shot cross-lingual transfer learning
  • Fill-mask task optimization

Frequently Asked Questions

Q: What makes this model unique?

The model combines DeBERTa's disentangled attention mechanism with ELECTRA-style pre-training, offering superior multilingual performance compared to XLM-R-base across all 16 supported languages.

Q: What are the recommended use cases?

The model excels in cross-lingual natural language understanding tasks, particularly when training on English data and transferring to other languages. It's ideal for multilingual text classification, fill-mask tasks, and general NLU applications.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026