DeBERTa-v3-large-mnli-fever-anli-ling-wanli
Property | Value |
---|---|
Parameter Count | 435M |
License | MIT |
Paper | DeBERTa-v3 Paper |
Training Data | 885,242 NLI pairs |
Best Performance | 91.2% (MultiNLI matched) |
What is DeBERTa-v3-large-mnli-fever-anli-ling-wanli?
This is a state-of-the-art Natural Language Inference (NLI) model built on Microsoft's DeBERTa-v3-large architecture. The model has been fine-tuned on multiple high-quality datasets including MultiNLI, Fever-NLI, ANLI, LingNLI, and WANLI, making it particularly robust for zero-shot classification tasks.
Implementation Details
The model leverages advanced training techniques including mixed precision training, weight decay, and gradient accumulation. It was trained for 4 epochs with a learning rate of 5e-06 and achieved breakthrough performance particularly on the challenging ANLI benchmark, outperforming previous SOTA by 8.3%.
- Uses disentangled attention mechanism
- Implements RTD (Replaced Token Detection) pre-training objective
- Supports both single-label and multi-label classification
Core Capabilities
- Zero-shot text classification with high accuracy
- Natural Language Inference tasks
- Hypothesis-premise pair analysis
- Multi-language understanding
Frequently Asked Questions
Q: What makes this model unique?
This model combines several innovations in transformer architecture with comprehensive training on multiple high-quality NLI datasets, achieving state-of-the-art performance across various benchmarks, particularly on the challenging ANLI dataset.
Q: What are the recommended use cases?
The model excels in zero-shot classification tasks, text entailment analysis, and general natural language understanding applications. It's particularly suitable for scenarios where traditional supervised learning isn't feasible due to data limitations.