DistilBART-XSUM-12-6
Property | Value |
---|---|
Author | sshleifer |
Parameter Count | 306M |
Model Type | Summarization |
Hugging Face | Model Repository |
Inference Time | 137ms |
Speedup vs Baseline | 1.68x |
What is distilbart-xsum-12-6?
DistilBART-XSUM-12-6 is a knowledge-distilled version of the BART model specifically optimized for extreme summarization tasks. It achieves impressive performance with ROUGE-2 and ROUGE-L scores of 22.12 and 36.99 respectively, actually surpassing its larger baseline model while using fewer parameters and offering faster inference.
Implementation Details
The model architecture features 12 encoder layers and 6 decoder layers, hence the "12-6" naming convention. It's designed to be loaded using BartForConditionalGeneration.from_pretrained and represents an optimal balance between model size and performance.
- 306M parameters (25% reduction from baseline)
- 1.68x inference speedup compared to BART-large-xsum
- Optimized for the XSUM dataset
- 137ms inference time per sample
Core Capabilities
- Extreme summarization with state-of-the-art performance
- Efficient inference with reduced computational requirements
- Superior ROUGE scores compared to baseline model
- Maintains quality while reducing model size
Frequently Asked Questions
Q: What makes this model unique?
This model achieves the remarkable feat of outperforming its larger parent model while being significantly more efficient. With 306M parameters compared to the baseline's 406M, it delivers better ROUGE scores and 1.68x faster inference.
Q: What are the recommended use cases?
The model is specifically designed for extreme summarization tasks, particularly those similar to the XSUM dataset. It's ideal for applications requiring concise, high-quality summaries while maintaining computational efficiency.