led-large-book-summary

Property	Value
Parameter Count	460M
License	BSD-3-Clause
Paper	BookSum Paper
Max Input Length	16,384 tokens

What is led-large-book-summary?

led-large-book-summary is a powerful text summarization model based on the LED (Longformer Encoder-Decoder) architecture, specifically designed for handling long documents. Fine-tuned on the BookSum dataset, this model excels at generating concise summaries while maintaining coherence across lengthy texts up to 16,384 tokens.

Implementation Details

The model underwent extensive training across 13+ epochs on the BookSum dataset, with careful hyperparameter tuning throughout different training stages. It utilizes advanced features like encoder_no_repeat_ngram_size and global attention masks to produce high-quality summaries.

Trained using transformers 4.19.2 and PyTorch 1.11.0
Implements sophisticated beam search with num_beams=4
Uses repetition penalty of 3.5 to ensure diverse outputs
Supports variable length summaries with configurable min/max lengths

Core Capabilities

Long document processing up to 16K tokens
Achieves ROUGE-1 scores of 31.73 on BookSum test set
Handles various text types including academic papers, books, and articles
Efficient token batching for processing lengthy documents

Frequently Asked Questions

Q: What makes this model unique?

This model's ability to handle extremely long documents (up to 16,384 tokens) while maintaining coherent summaries sets it apart. It's specifically optimized for book-length content and academic materials, making it ideal for research and content summarization tasks.

Q: What are the recommended use cases?

The model is particularly well-suited for summarizing academic papers, book chapters, long-form articles, and research documents. It's optimized for cases where maintaining context across long passages is crucial.