DistilBART-XSUM-12-6

Property	Value
Author	sshleifer
Parameter Count	306M
Model Type	Summarization
Hugging Face	Model Repository
Inference Time	137ms
Speedup vs Baseline	1.68x

What is distilbart-xsum-12-6?

DistilBART-XSUM-12-6 is a knowledge-distilled version of the BART model specifically optimized for extreme summarization tasks. It achieves impressive performance with ROUGE-2 and ROUGE-L scores of 22.12 and 36.99 respectively, actually surpassing its larger baseline model while using fewer parameters and offering faster inference.

Implementation Details

The model architecture features 12 encoder layers and 6 decoder layers, hence the "12-6" naming convention. It's designed to be loaded using BartForConditionalGeneration.from_pretrained and represents an optimal balance between model size and performance.

306M parameters (25% reduction from baseline)
1.68x inference speedup compared to BART-large-xsum
Optimized for the XSUM dataset
137ms inference time per sample

Core Capabilities

Extreme summarization with state-of-the-art performance
Efficient inference with reduced computational requirements
Superior ROUGE scores compared to baseline model
Maintains quality while reducing model size

Frequently Asked Questions

Q: What makes this model unique?

This model achieves the remarkable feat of outperforming its larger parent model while being significantly more efficient. With 306M parameters compared to the baseline's 406M, it delivers better ROUGE scores and 1.68x faster inference.

Q: What are the recommended use cases?

The model is specifically designed for extreme summarization tasks, particularly those similar to the XSUM dataset. It's ideal for applications requiring concise, high-quality summaries while maintaining computational efficiency.