DeBERTa Small Long NLI
Property | Value |
---|---|
Parameter Count | 142M |
License | Apache 2.0 |
Paper | arXiv:2301.05948 |
Context Length | 1680 tokens |
Training Duration | 14 days on Nvidia A30 |
What is deberta-small-long-nli?
DeBERTa-small-long-nli is a specialized variant of DeBERTa-v3-small, extensively fine-tuned on over 600 NLP tasks with a focus on natural language inference (NLI) and zero-shot classification. The model features an extended context length of 1680 tokens and has been trained for 250,000 steps with particular emphasis on long NLI tasks.
Implementation Details
The model was trained with a batch size of 384 and a peak learning rate of 2e-5. It implements a unique approach where each task has a specific CLS embedding, which is occasionally dropped (10% probability) to enhance model flexibility. The architecture shares weights across classification tasks with matching labels, optimizing for efficiency and transfer learning.
- Trained on diverse datasets including HelpSteer, logical reasoning tasks, and RLHF data
- Achieves 70% accuracy on WNLI zero-shot validation
- Implements shared classification layers for multiple-choice tasks
- Supports context lengths up to 1680 tokens
Core Capabilities
- Zero-shot classification for arbitrary labels
- Natural language inference tasks
- Long-context document analysis
- Reward modeling for reinforcement learning
- Fine-tuning backbone for specialized tasks
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness lies in its extensive multi-task training across 600 different tasks and its optimized performance on long-context NLI tasks. It demonstrates strong zero-shot capabilities and can handle various classification scenarios without additional training.
Q: What are the recommended use cases?
The model excels in zero-shot classification, natural language inference, and as a foundation for fine-tuning task-specific models. It's particularly suitable for long-document analysis and can serve as a reward model backbone for reinforcement learning applications.