DistilBART-MNLI-12-1
Property | Value |
---|---|
Author | valhalla |
Task Type | Zero-Shot Classification |
Framework | PyTorch |
Downloads | 26,869 |
What is distilbart-mnli-12-1?
DistilBART-MNLI-12-1 is a distilled version of the BART-large-MNLI model, created using the No Teacher Distillation technique. It maintains 12 encoder layers but reduces to just 1 decoder layer, offering an efficient alternative while preserving strong performance. The model achieves 87.08% accuracy on matched and 87.5% on mismatched MNLI datasets, compared to the original model's 89.9% and 90.01% respectively.
Implementation Details
The model implements a novel distillation approach where alternating layers from bart-large-mnli are copied and fine-tuned on the same data. This technique provides a remarkable balance between model efficiency and performance, demonstrating only a minimal drop in accuracy despite significant architecture reduction.
- 12 encoder layers with optimized architecture
- Single decoder layer for efficient processing
- Built on PyTorch framework
- Specialized for zero-shot classification tasks
Core Capabilities
- Zero-shot text classification
- Natural language inference tasks
- Efficient processing with reduced parameter count
- Maintains strong performance metrics despite compression
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its successful application of No Teacher Distillation, achieving near-baseline performance while significantly reducing the model architecture. It represents an excellent balance between efficiency and accuracy.
Q: What are the recommended use cases?
The model is ideal for zero-shot classification tasks where efficient processing is required without significantly compromising accuracy. It's particularly suitable for production environments where resource optimization is crucial.