mbart_ru_sum_gazeta

Property	Value
Parameter Count	867M
License	Apache 2.0
Paper	Dataset for Automatic Summarization of Russian News
Author	IlyaGusev

What is mbart_ru_sum_gazeta?

mbart_ru_sum_gazeta is a specialized Russian text summarization model based on the mBART architecture, specifically designed for summarizing news articles. Developed by IlyaGusev, this model has been trained on the Gazeta dataset and demonstrates strong performance in generating concise, accurate summaries of Russian news content.

Implementation Details

The model utilizes the mBART architecture with 867M parameters and operates using F32 tensor types. It's implemented using PyTorch and Transformers libraries, with support for Safetensors format. The model incorporates specific constraints like a no-repeat-ngram-size of 4 to prevent repetitive text generation.

Achieves ROUGE-1 F1 score of 32.4 on Gazeta v1 test set
Supports maximum input length of 600 tokens
Generates summaries with maximum length of 200 tokens
Implements beam search with 5 beams for optimal generation

Core Capabilities

Automatic summarization of Russian news articles
Handles long-form content up to 600 tokens
Produces coherent and fluent Russian language summaries
Optimized for Gazeta.ru style articles

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Russian news summarization, achieving state-of-the-art results on the Gazeta dataset with superior ROUGE and METEOR scores compared to alternatives like RuT5 and RuGPT3.

Q: What are the recommended use cases?

The model is best suited for summarizing Russian news articles, particularly those similar to Gazeta.ru's style. However, it may experience domain shift when used with content from other news agencies or different content types.