deberta-v3-small

deberta-v3-small

microsoft

DeBERTa-v3-small: Microsoft's 44M parameter language model combining ELECTRA-style pre-training with gradient-disentangled embedding sharing.

PropertyValue
Parameters44M (backbone) + 98M (embedding)
LicenseMIT
AuthorMicrosoft
PaperDeBERTaV3 Paper

What is deberta-v3-small?

DeBERTa-v3-small is a compact yet powerful language model that represents the third generation of Microsoft's DeBERTa architecture. It features 6 layers with a hidden size of 768, utilizing ELECTRA-style pre-training combined with gradient-disentangled embedding sharing. The model achieves impressive performance metrics, including 82.8/80.4 F1/EM scores on SQuAD 2.0 and 88.3/87.7 accuracy on MNLI-m/mm tasks.

Implementation Details

The model incorporates several innovative architectural elements that set it apart from its predecessors:

  • 6-layer architecture with 768 hidden size
  • 128K token vocabulary
  • Enhanced mask decoder system
  • Disentangled attention mechanism
  • ELECTRA-style pre-training approach

Core Capabilities

  • Fill-mask operations
  • Natural Language Understanding (NLU) tasks
  • Question answering (demonstrated by SQuAD performance)
  • Natural Language Inference (shown by MNLI results)

Frequently Asked Questions

Q: What makes this model unique?

DeBERTa-v3-small stands out for its efficient architecture that combines the benefits of ELECTRA-style pre-training with gradient-disentangled embedding sharing, achieving strong performance despite its relatively small size of 44M parameters.

Q: What are the recommended use cases?

The model is particularly well-suited for natural language understanding tasks, including question answering, natural language inference, and text classification. It offers a good balance between model size and performance, making it suitable for production environments with resource constraints.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026