mt5-small-sum-de-en-v2

Maintained By
T-Systems-onsite

mt5-small-sum-de-en-v2

PropertyValue
Base Modelgoogle/mt5-small
LanguagesEnglish and German
Training Data Size724,835 examples
LicenseCC BY-NC-SA 3.0
AuthorT-Systems-onsite

What is mt5-small-sum-de-en-v2?

mt5-small-sum-de-en-v2 is a specialized bilingual text summarization model built on Google's mt5-small architecture. It's designed to generate concise summaries in both English and German, trained on a diverse dataset of over 724,000 examples from news sources including CNN, Daily Mail, BBC XSum, and German MLSUM.

Implementation Details

The model was trained with careful consideration of hyperparameters, using a batch size of 3, maximum source length of 800 tokens, and maximum target length of 96 tokens. Training ran for 10 epochs with a learning rate of 5e-5 and gradient accumulation steps of 2. A notable preprocessing step included removing existing summaries from source texts to prevent the model from simply extracting sentences.

  • Trained on multiple high-quality datasets including CNN Daily, XSum, MLSUM German, and SwissText 2019
  • Implements source prefix "summarize: " for inference
  • Achieves superior ROUGE scores compared to previous versions and some competing models
  • Carefully curated training data with summary length restrictions (≤94 tokens)

Core Capabilities

  • Bilingual summarization in English and German
  • Achieves ROUGE-1 scores of 37.81 on CNN Daily test set
  • Performs well on XSum with ROUGE-1 score of 32.48
  • Handles German summarization with ROUGE-1 score of 21.78 on MLSUM

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its bilingual capabilities and strong performance across multiple datasets. It's particularly notable for maintaining high quality in both English and German summarization tasks, which is challenging for many models.

Q: What are the recommended use cases?

The model is best suited for news article summarization in English and German. However, due to its license (CC BY-NC-SA 3.0), it's restricted to non-commercial applications and research purposes.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.