FRED-T5-1.7B
Property | Value |
---|---|
Model Size | 1.7B parameters |
Architecture | T5-based (24 layers, 1536 hidden size) |
Training Data | 300GB Russian language corpus |
License | Apache 2.0 |
Research Paper | View Paper |
What is FRED-T5-1.7B?
FRED-T5-1.7B (Full-scale Russian Enhanced Denoisers T5) is a sophisticated language model developed by SberDevices, specifically designed for Russian language processing. The model represents a significant advancement in Russian NLP, trained using a mixture of 7 denoisers similar to UL2 architecture.
Implementation Details
The model features a bpe tokenizer with 50,257 base tokens plus 107 special tokens, including specific prefix tokens like '<LM>', '<SC1>' through '<SC6>'. The training process was conducted over approximately 45 days using 112 A100 GPUs, with a unique two-phase approach where the initial training focused on a smaller dataset subset.
- 24-layer architecture with 1536 hidden size
- Trained on 300GB Russian language corpus
- Implements multiple denoising objectives
- Specialized prefix tokens for different tasks
Core Capabilities
- Advanced text generation in Russian
- Multiple denoising tasks support
- Flexible prefix-based task conditioning
- Russian SuperGLUE benchmark optimization
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its specialized training approach combining 7 denoisers with a two-phase training strategy, specifically optimized for Russian language tasks. The implementation of multiple prefix tokens allows for versatile task handling.
Q: What are the recommended use cases?
FRED-T5-1.7B is particularly well-suited for Russian language processing tasks, including text generation, completion, and various conditional generation tasks. Its multiple prefix tokens make it adaptable for different NLP applications.