t5-base-dutch
Property | Value |
---|---|
Parameter Count | 223M |
Model Type | T5 |
Training Duration | 2 days 9 hours |
Sequence Length | 512 |
Tokenizer | SentencePiece (32003 tokens) |
What is t5-base-dutch?
t5-base-dutch is a pre-trained Dutch language model based on the T5 architecture, developed during the Hugging Face community week with support from Google's TPU Research Cloud. The model was trained on cleaned Dutch mC4 data and achieves a 0.70 evaluation accuracy on masked language modeling tasks.
Implementation Details
The model utilizes a base T5 architecture with 12 layers, 768 hidden dimensions, and 3072 feed-forward dimensions. It was trained for 1 epoch with a batch size of 128 and 527,500 total steps, processing approximately 35B tokens. The model employs ReLU activation and a dropout rate of 0.1.
- Pre-trained using masked language modeling (denoise token span corruption)
- Trained on cleaned Dutch mC4 dataset with extensive filtering for quality
- Uses a cased SentencePiece tokenizer with NFKC normalization
- Optimized with Adafactor optimizer and 0.005 learning rate
Core Capabilities
- Achieves 33.38 ROUGE1 score on Dutch summarization tasks
- Demonstrates 45.88 BLEU score on English-to-Dutch translation
- Suitable for fine-tuning on various Dutch language tasks
- Processes 3.18 samples per second during summarization
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for Dutch language processing, trained on cleaned Dutch data with careful filtering. It offers a balance between model size and performance, making it suitable for various downstream tasks after fine-tuning.
Q: What are the recommended use cases?
The model is designed for Dutch language tasks including summarization, translation, and other text-to-text applications. It requires fine-tuning before use in specific downstream tasks and has shown strong performance particularly in Dutch summarization and English-to-Dutch translation.