t5-base-dutch

Property	Value
Parameter Count	223M
Model Type	T5
Training Duration	2 days 9 hours
Sequence Length	512
Tokenizer	SentencePiece (32003 tokens)

What is t5-base-dutch?

t5-base-dutch is a pre-trained Dutch language model based on the T5 architecture, developed during the Hugging Face community week with support from Google's TPU Research Cloud. The model was trained on cleaned Dutch mC4 data and achieves a 0.70 evaluation accuracy on masked language modeling tasks.

Implementation Details

The model utilizes a base T5 architecture with 12 layers, 768 hidden dimensions, and 3072 feed-forward dimensions. It was trained for 1 epoch with a batch size of 128 and 527,500 total steps, processing approximately 35B tokens. The model employs ReLU activation and a dropout rate of 0.1.

Pre-trained using masked language modeling (denoise token span corruption)
Trained on cleaned Dutch mC4 dataset with extensive filtering for quality
Uses a cased SentencePiece tokenizer with NFKC normalization
Optimized with Adafactor optimizer and 0.005 learning rate

Core Capabilities

Achieves 33.38 ROUGE1 score on Dutch summarization tasks
Demonstrates 45.88 BLEU score on English-to-Dutch translation
Suitable for fine-tuning on various Dutch language tasks
Processes 3.18 samples per second during summarization

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Dutch language processing, trained on cleaned Dutch data with careful filtering. It offers a balance between model size and performance, making it suitable for various downstream tasks after fine-tuning.

Q: What are the recommended use cases?

The model is designed for Dutch language tasks including summarization, translation, and other text-to-text applications. It requires fine-tuning before use in specific downstream tasks and has shown strong performance particularly in Dutch summarization and English-to-Dutch translation.

t5-base-dutch

t5-base-dutch

What is t5-base-dutch?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models