t5-base-nl36-finnish

Property	Value
Parameter Count	814 million
Model Type	T5 Encoder-Decoder
Architecture	Deep-Narrow with 36 transformer layers
Training Data	76GB of cleaned Finnish text
Vocabulary Size	32,000 tokens

What is t5-base-nl36-finnish?

t5-base-nl36-finnish is a powerful Finnish language model based on the T5 architecture, specifically designed for Finnish language processing tasks. It represents a significant advancement in Finnish NLP, featuring 814 million parameters and trained on a diverse collection of Finnish text data including news archives, Wikipedia, and web crawl content. The model implements the efficient Deep-Narrow architecture with 36 transformer layers, making it particularly effective for downstream tasks after fine-tuning.

Implementation Details

The model was trained using the T5 v1.1 improvements on TPUv3-8 VM for 1M steps with a batch size of 64, processing approximately 33B tokens. It employs span-based masked language modeling with unique features such as GEGLU activation and no parameter sharing between embedding and classifier layers.

Uses WordPiece tokenization with 32,000 vocabulary size
Implements case-sensitive processing
Trained with AdaFactor optimizer and learning rate warmup
Sequences of 512 consecutive tokens for inputs and outputs

Core Capabilities

Achieves 94.40% accuracy on Yle News classification after fine-tuning
Performs 75.97% accuracy on Eduskunta classification tasks
Outperforms multilingual mT5 models on Finnish text classification
Supports various downstream NLP tasks through fine-tuning

Frequently Asked Questions

Q: What makes this model unique?

The model's Deep-Narrow architecture with 36 transformer layers and exclusive focus on Finnish language processing makes it particularly effective for Finnish NLP tasks. It significantly outperforms multilingual models on Finnish-specific tasks.

Q: What are the recommended use cases?

The model requires task-specific fine-tuning before use. It's particularly suitable for Finnish text classification, text generation, and other NLP tasks. However, it should be fine-tuned with full fp32 precision for optimal results.