ruGPT-3.5-13B
Property | Value |
---|---|
Parameters | 13 Billion |
License | MIT |
Languages | Russian, English |
Training Data | 300GB + 100GB additional |
Training Infrastructure | 512 V100 GPUs, later 200 A100 GPUs |
What is ruGPT-3.5-13B?
ruGPT-3.5-13B is a state-of-the-art language model specifically designed for Russian language processing, developed by ai-forever. It represents their largest model to date and serves as the foundation for GigaChat. The model demonstrates exceptional capabilities in both Russian and English text generation, trained on a massive corpus of 300GB of diverse content plus an additional 100GB of specialized code and legal documents.
Implementation Details
The model underwent an intensive training process utilizing Deepspeed and Megatron libraries. The initial training phase covered 300B tokens over 3 epochs, spanning approximately 45 days on 512 V100 GPUs. This was followed by a fine-tuning phase on additional data with a 2048 sequence length, conducted over 20 days using 200 A100 GPUs. The final model achieved an impressive perplexity of 8.8 for Russian language tasks.
- Comprehensive training on deduplicated dataset with unique hash verification
- Advanced text compression filtering using zlib4
- Specialized fine-tuning on code and legal documents
- Optimized for both Russian and English language processing
Core Capabilities
- Advanced text generation in Russian and English
- Poetry and creative writing capabilities
- Technical and scientific content generation
- Historical and factual information processing
- Legal and code-related text processing
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its specialized training for Russian language processing while maintaining English language capabilities. Its extensive training on 300GB of diverse content plus 100GB of specialized data makes it particularly powerful for both general and domain-specific applications.
Q: What are the recommended use cases?
The model excels in various applications including creative writing, technical documentation, legal document processing, and code-related tasks. It's particularly well-suited for applications requiring deep understanding of Russian language nuances while maintaining multilingual capabilities.