led-large-book-summary
Property | Value |
---|---|
Parameter Count | 460M |
License | BSD-3-Clause |
Paper | BookSum Paper |
Max Input Length | 16,384 tokens |
What is led-large-book-summary?
led-large-book-summary is a powerful text summarization model based on the LED (Longformer Encoder-Decoder) architecture, specifically designed for handling long documents. Fine-tuned on the BookSum dataset, this model excels at generating concise summaries while maintaining coherence across lengthy texts up to 16,384 tokens.
Implementation Details
The model underwent extensive training across 13+ epochs on the BookSum dataset, with careful hyperparameter tuning throughout different training stages. It utilizes advanced features like encoder_no_repeat_ngram_size and global attention masks to produce high-quality summaries.
- Trained using transformers 4.19.2 and PyTorch 1.11.0
- Implements sophisticated beam search with num_beams=4
- Uses repetition penalty of 3.5 to ensure diverse outputs
- Supports variable length summaries with configurable min/max lengths
Core Capabilities
- Long document processing up to 16K tokens
- Achieves ROUGE-1 scores of 31.73 on BookSum test set
- Handles various text types including academic papers, books, and articles
- Efficient token batching for processing lengthy documents
Frequently Asked Questions
Q: What makes this model unique?
This model's ability to handle extremely long documents (up to 16,384 tokens) while maintaining coherent summaries sets it apart. It's specifically optimized for book-length content and academic materials, making it ideal for research and content summarization tasks.
Q: What are the recommended use cases?
The model is particularly well-suited for summarizing academic papers, book chapters, long-form articles, and research documents. It's optimized for cases where maintaining context across long passages is crucial.