LSG-BART-Base-4096-MultiNews
Property | Value |
---|---|
Parameter Count | 145 million |
Architecture | BART-base with LSG attention |
Maximum Sequence Length | 4096 tokens |
Model Type | Text-to-text generation |
Source | Hugging Face |
What is lsg-bart-base-4096-multinews?
This is a specialized BART-based model that implements Local-Sparse-Global (LSG) attention to handle long sequences up to 4096 tokens. The model is specifically fine-tuned for multi-document summarization tasks, built upon the BART-base architecture with 6 encoder and 6 decoder layers.
Implementation Details
The model utilizes an innovative attention mechanism that combines local and sparse global attention patterns to efficiently process long documents. It achieves impressive ROUGE scores (R1: 47.10, R2: 18.94, RL: 25.22) while maintaining computational efficiency through various sparsity patterns.
- Implements multiple sparsity types: Local, Pooling, Stride, Block Stride, Norm, and LSH
- Optimized with block sizes ranging from 32 to 256 tokens
- Fine-tuned using Adam optimizer with carefully selected hyperparameters
- Supports various generation configurations including beam search and n-gram repetition prevention
Core Capabilities
- Long document processing up to 4096 tokens
- Efficient memory usage through sparse attention patterns
- Optimized for multi-document summarization
- Flexible sparsity configurations for different resource constraints
Frequently Asked Questions
Q: What makes this model unique?
The model's key innovation lies in its LSG attention mechanism, allowing it to process sequences up to 4096 tokens while maintaining strong performance through various sparsity patterns and block sizes. This makes it particularly effective for long-document summarization tasks.
Q: What are the recommended use cases?
This model is specifically designed for multi-document summarization tasks where processing long input sequences is crucial. It's particularly useful in scenarios requiring the synthesis of multiple documents or long-form content into concise summaries.