lsg-bart-base-4096-multinews

Maintained By
ccdv

LSG-BART-Base-4096-MultiNews

PropertyValue
Parameter Count145 million
ArchitectureBART-base with LSG attention
Maximum Sequence Length4096 tokens
Model TypeText-to-text generation
SourceHugging Face

What is lsg-bart-base-4096-multinews?

This is a specialized BART-based model that implements Local-Sparse-Global (LSG) attention to handle long sequences up to 4096 tokens. The model is specifically fine-tuned for multi-document summarization tasks, built upon the BART-base architecture with 6 encoder and 6 decoder layers.

Implementation Details

The model utilizes an innovative attention mechanism that combines local and sparse global attention patterns to efficiently process long documents. It achieves impressive ROUGE scores (R1: 47.10, R2: 18.94, RL: 25.22) while maintaining computational efficiency through various sparsity patterns.

  • Implements multiple sparsity types: Local, Pooling, Stride, Block Stride, Norm, and LSH
  • Optimized with block sizes ranging from 32 to 256 tokens
  • Fine-tuned using Adam optimizer with carefully selected hyperparameters
  • Supports various generation configurations including beam search and n-gram repetition prevention

Core Capabilities

  • Long document processing up to 4096 tokens
  • Efficient memory usage through sparse attention patterns
  • Optimized for multi-document summarization
  • Flexible sparsity configurations for different resource constraints

Frequently Asked Questions

Q: What makes this model unique?

The model's key innovation lies in its LSG attention mechanism, allowing it to process sequences up to 4096 tokens while maintaining strong performance through various sparsity patterns and block sizes. This makes it particularly effective for long-document summarization tasks.

Q: What are the recommended use cases?

This model is specifically designed for multi-document summarization tasks where processing long input sequences is crucial. It's particularly useful in scenarios requiring the synthesis of multiple documents or long-form content into concise summaries.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.