paraphraser-bart-large
Property | Value |
---|---|
Base Model | facebook/bart-large |
Paper | AutoQA: From Databases to QA Semantic Parsers with Only Synthetic Training Data |
Author | stanford-oval |
Training Data | ParaBank 2 (5M sentence pairs) |
What is paraphraser-bart-large?
paraphraser-bart-large is a specialized language model designed for high-quality sentence paraphrasing, developed by Stanford researchers. Built on the BART-large architecture, this model was trained on a carefully curated subset of the ParaBank 2 dataset, consisting of 5 million high-quality sentence pairs derived from English-Czech translations.
Implementation Details
The model utilizes a fine-tuned version of facebook/bart-large, trained for 4 epochs on cleaned ParaBank 2 data. The training process employs token-level cross-entropy loss and uses mini-batches of 1280 examples, with sentences grouped by length for optimal training efficiency.
- Trained on back-translated Czech-English pairs for grammatical accuracy
- Uses cleaned dataset removing URLs and excessive special characters
- Implements efficient batch processing with length-based grouping
- Supports controllable generation with top_p and temperature parameters
Core Capabilities
- Sentence-level paraphrasing with high grammatical accuracy
- Controllable diversity through temperature parameter (0-1)
- Maintains semantic meaning while varying expression
- Optimized for single-sentence transformation
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its focus on high-quality, grammatically correct paraphrasing, achieved through its unique training approach using back-translated data and careful dataset curation. The ability to control output diversity through temperature settings makes it highly versatile for different use cases.
Q: What are the recommended use cases?
The model is best suited for sentence-level paraphrasing tasks. For optimal results, use top_p=0.9 and adjust temperature between 0-1 (higher values for more diverse paraphrases). When working with paragraphs, it's recommended to split them into individual sentences first.