bart-large-finetuned-filtered-spotify-podcast-summ
Property | Value |
---|---|
Base Model | facebook/bart-large-cnn |
License | MIT |
Paper | Research Paper |
Training Dataset Size | 69,336 episodes |
What is bart-large-finetuned-filtered-spotify-podcast-summ?
This is a specialized summarization model fine-tuned on the Spotify Podcast Dataset, built upon the BART-large-CNN architecture. It's designed to generate concise, readable summaries of podcast transcripts that help users decide whether to listen to a particular episode. The model achieved a training loss of 2.2967 and validation loss of 2.8316 after 2 epochs of training.
Implementation Details
The model implements a two-stage approach: an extractive module selects important transcript segments, followed by abstractive summarization. It's optimized using AdamWeightDecay with a learning rate of 2e-05 and trained using float32 precision.
- Training set: 69,336 episodes
- Validation set: 7,705 episodes
- Test set: 1,025 episodes
- Supports variable length summaries (39-250 tokens)
Core Capabilities
- Automatic podcast transcript summarization
- Human-readable summary generation
- Mobile-friendly output length
- Content-faithful summarization
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for podcast content summarization, combining extractive and abstractive techniques to create concise, informative summaries. It's trained on a carefully filtered dataset to ensure high-quality outputs suitable for quick consumption on mobile devices.
Q: What are the recommended use cases?
The model is ideal for automated podcast content previews, content management systems, and podcast platforms looking to provide quick episode overviews. It's particularly suited for scenarios where users need to quickly decide whether to invest time in listening to a full podcast episode.