mbart_ru_sum_gazeta
Property | Value |
---|---|
Parameter Count | 867M |
License | Apache 2.0 |
Paper | Dataset for Automatic Summarization of Russian News |
Author | IlyaGusev |
What is mbart_ru_sum_gazeta?
mbart_ru_sum_gazeta is a specialized Russian text summarization model based on the mBART architecture, specifically designed for summarizing news articles. Developed by IlyaGusev, this model has been trained on the Gazeta dataset and demonstrates strong performance in generating concise, accurate summaries of Russian news content.
Implementation Details
The model utilizes the mBART architecture with 867M parameters and operates using F32 tensor types. It's implemented using PyTorch and Transformers libraries, with support for Safetensors format. The model incorporates specific constraints like a no-repeat-ngram-size of 4 to prevent repetitive text generation.
- Achieves ROUGE-1 F1 score of 32.4 on Gazeta v1 test set
- Supports maximum input length of 600 tokens
- Generates summaries with maximum length of 200 tokens
- Implements beam search with 5 beams for optimal generation
Core Capabilities
- Automatic summarization of Russian news articles
- Handles long-form content up to 600 tokens
- Produces coherent and fluent Russian language summaries
- Optimized for Gazeta.ru style articles
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for Russian news summarization, achieving state-of-the-art results on the Gazeta dataset with superior ROUGE and METEOR scores compared to alternatives like RuT5 and RuGPT3.
Q: What are the recommended use cases?
The model is best suited for summarizing Russian news articles, particularly those similar to Gazeta.ru's style. However, it may experience domain shift when used with content from other news agencies or different content types.