legal-pegasus
Property | Value |
---|---|
License | MIT |
Base Model | google/pegasus-cnn_dailymail |
Maximum Input Length | 1024 tokens |
Training Data | SEC Litigation Releases (2700+ documents) |
What is legal-pegasus?
legal-pegasus is a specialized language model designed for abstractive summarization of legal documents. Built upon Google's PEGASUS architecture, this model has been fine-tuned specifically for processing and summarizing legal text, with particular emphasis on SEC litigation releases and complaints. The model demonstrates significant performance improvements over its base version, achieving impressive ROUGE scores in legal document summarization tasks.
Implementation Details
The model is implemented using the Transformers library and PyTorch backend. It utilizes a sequence-to-sequence architecture with a maximum input length of 1024 tokens, making it suitable for processing lengthy legal documents. The generation process employs beam search with 9 beams and includes specific parameters for length control and repetition prevention.
- No-repeat ngram size: 3
- Length penalty: 2.0
- Minimum output length: 150 tokens
- Maximum output length: 250 tokens
Core Capabilities
- Abstractive summarization of legal documents
- Significantly improved performance over base PEGASUS (57.39% vs 43.16% ROUGE-1)
- Specialized in SEC litigation document processing
- Handles complex legal terminology and context
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its specialized training on SEC litigation releases, making it particularly effective for legal document summarization. It shows substantial improvements over the base PEGASUS model, with a 14.23 percentage point increase in ROUGE-1 scores.
Q: What are the recommended use cases?
The model is best suited for summarizing legal documents, particularly those related to SEC filings, litigation releases, and legal complaints. It can efficiently process documents up to 1024 tokens in length while maintaining coherent and accurate summaries.