FRED-T5-1.7B

Property	Value
Model Size	1.7B parameters
Architecture	T5-based (24 layers, 1536 hidden size)
Training Data	300GB Russian language corpus
License	Apache 2.0
Research Paper	View Paper

What is FRED-T5-1.7B?

FRED-T5-1.7B (Full-scale Russian Enhanced Denoisers T5) is a sophisticated language model developed by SberDevices, specifically designed for Russian language processing. The model represents a significant advancement in Russian NLP, trained using a mixture of 7 denoisers similar to UL2 architecture.

Implementation Details

The model features a bpe tokenizer with 50,257 base tokens plus 107 special tokens, including specific prefix tokens like '<LM>', '<SC1>' through '<SC6>'. The training process was conducted over approximately 45 days using 112 A100 GPUs, with a unique two-phase approach where the initial training focused on a smaller dataset subset.

24-layer architecture with 1536 hidden size
Trained on 300GB Russian language corpus
Implements multiple denoising objectives
Specialized prefix tokens for different tasks

Core Capabilities

Advanced text generation in Russian
Multiple denoising tasks support
Flexible prefix-based task conditioning
Russian SuperGLUE benchmark optimization

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its specialized training approach combining 7 denoisers with a two-phase training strategy, specifically optimized for Russian language tasks. The implementation of multiple prefix tokens allows for versatile task handling.

Q: What are the recommended use cases?

FRED-T5-1.7B is particularly well-suited for Russian language processing tasks, including text generation, completion, and various conditional generation tasks. Its multiple prefix tokens make it adaptable for different NLP applications.

FRED-T5-1.7B

FRED-T5-1.7B

What is FRED-T5-1.7B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models