mdeberta-v3-base

Maintained By
microsoft

mdeberta-v3-base

PropertyValue
Parameters86M (backbone) + 190M (embedding)
Languages16 languages including English, Arabic, Chinese, etc.
LicenseMIT
AuthorMicrosoft
PaperDeBERTaV3 Paper

What is mdeberta-v3-base?

mdeberta-v3-base is a multilingual enhancement of the DeBERTa architecture, incorporating ELECTRA-style pre-training with gradient-disentangled embedding sharing. The model features 12 layers with a hidden size of 768 and was trained on 2.5T CC100 multilingual data.

Implementation Details

This model builds upon the success of DeBERTa by implementing disentangled attention and enhanced mask decoder mechanisms. It employs a 250K token vocabulary and demonstrates superior performance in cross-lingual transfer tasks.

  • Utilizes ELECTRA-style pre-training methodology
  • Implements gradient-disentangled embedding sharing
  • Features enhanced mask decoder for improved performance
  • Supports zero-shot cross-lingual transfer

Core Capabilities

  • Multilingual text processing across 16 languages
  • Superior performance on XNLI benchmark (79.8% average accuracy)
  • Zero-shot cross-lingual transfer learning
  • Fill-mask task optimization

Frequently Asked Questions

Q: What makes this model unique?

The model combines DeBERTa's disentangled attention mechanism with ELECTRA-style pre-training, offering superior multilingual performance compared to XLM-R-base across all 16 supported languages.

Q: What are the recommended use cases?

The model excels in cross-lingual natural language understanding tasks, particularly when training on English data and transferring to other languages. It's ideal for multilingual text classification, fill-mask tasks, and general NLU applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.