PEGASUS-BillSum

Property	Value
Base Model	google/pegasus-large
Training Data	BillSum Dataset
Model Hub	HuggingFace
ROUGE-1 Score	56.87

What is pegasus-billsum?

PEGASUS-BillSum is a specialized text summarization model fine-tuned on the BillSum dataset for generating concise summaries of legislative bills. Built upon the powerful PEGASUS-large architecture, this model has been optimized specifically for handling complex legislative language and producing accurate, coherent summaries.

Implementation Details

The model was trained using transformers v4.13 with specific optimizations. Training was conducted over 6.6 epochs (12,000 steps) using Adafactor optimizer with a learning rate of 2e-4 and label smoothing of 0.1. The model processes input texts up to 1024 tokens and generates summaries up to 256 tokens.

Uses beam search with 8 beams for generation
Trained with batch size of 2 per device across multiple GPUs
Achieves impressive ROUGE scores: ROUGE-1: 56.87, ROUGE-2: 38.65, ROUGE-L: 44.84

Core Capabilities

Efficient summarization of legislative documents
Handles complex legal terminology and structure
Generates concise, accurate summaries maintaining key information
Processes long documents up to 1024 tokens

Frequently Asked Questions

Q: What makes this model unique?

This model specializes in legislative text summarization, with optimized performance on the BillSum dataset and impressive ROUGE scores, making it particularly effective for summarizing legal documents and bills.

Q: What are the recommended use cases?

The model is best suited for summarizing legislative bills, legal documents, and policy papers. It's particularly useful for legal professionals, policy analysts, and researchers who need to quickly digest lengthy legislative texts.

pegasus-billsum