DeBERTa Small Long NLI

Property	Value
Parameter Count	142M
License	Apache 2.0
Paper	arXiv:2301.05948
Context Length	1680 tokens
Training Duration	14 days on Nvidia A30

What is deberta-small-long-nli?

DeBERTa-small-long-nli is a specialized variant of DeBERTa-v3-small, extensively fine-tuned on over 600 NLP tasks with a focus on natural language inference (NLI) and zero-shot classification. The model features an extended context length of 1680 tokens and has been trained for 250,000 steps with particular emphasis on long NLI tasks.

Implementation Details

The model was trained with a batch size of 384 and a peak learning rate of 2e-5. It implements a unique approach where each task has a specific CLS embedding, which is occasionally dropped (10% probability) to enhance model flexibility. The architecture shares weights across classification tasks with matching labels, optimizing for efficiency and transfer learning.

Trained on diverse datasets including HelpSteer, logical reasoning tasks, and RLHF data
Achieves 70% accuracy on WNLI zero-shot validation
Implements shared classification layers for multiple-choice tasks
Supports context lengths up to 1680 tokens

Core Capabilities

Zero-shot classification for arbitrary labels
Natural language inference tasks
Long-context document analysis
Reward modeling for reinforcement learning
Fine-tuning backbone for specialized tasks

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its extensive multi-task training across 600 different tasks and its optimized performance on long-context NLI tasks. It demonstrates strong zero-shot capabilities and can handle various classification scenarios without additional training.

Q: What are the recommended use cases?

The model excels in zero-shot classification, natural language inference, and as a foundation for fine-tuning task-specific models. It's particularly suitable for long-document analysis and can serve as a reward model backbone for reinforcement learning applications.