FLAN-T5-Base-SAMSum
Property | Value |
---|---|
License | Apache 2.0 |
Base Model | google/flan-t5-base |
Training Dataset | SAMSum |
Primary Task | Text Summarization |
What is flan-t5-base-samsum?
FLAN-T5-Base-SAMSum is a specialized text summarization model fine-tuned on the SAMSum dataset, built upon the powerful FLAN-T5-Base architecture. This model demonstrates impressive performance with a ROUGE1 score of 47.23, making it particularly effective for dialogue and conversation summarization tasks.
Implementation Details
The model was trained using a carefully optimized process with Adam optimizer, utilizing a linear learning rate scheduler and a learning rate of 5e-05. Training was conducted over 5 epochs with batch sizes of 8 for both training and evaluation, achieving consistent improvement in performance metrics throughout the training process.
- Training conducted over 9,210 steps
- Achieved final ROUGE1/2/L scores of 47.81/24.00/40.21
- Implements PyTorch framework with Transformers 4.25.1
Core Capabilities
- Specialized in dialogue summarization
- Optimized for text-to-text generation tasks
- Average generation length of 17.3 tokens
- Consistent performance across multiple ROUGE metrics
Frequently Asked Questions
Q: What makes this model unique?
This model combines the powerful FLAN-T5 architecture with specific optimization for dialogue summarization, achieving impressive ROUGE scores while maintaining reasonable output lengths.
Q: What are the recommended use cases?
The model is particularly well-suited for summarizing conversations, chat logs, and dialogue-heavy content, making it ideal for applications in customer service, meeting summarization, and social media content analysis.