rugpt3medium_based_on_gpt2
Property | Value |
---|---|
Training Data | 80B tokens |
Training Duration | 16 days on 64 GPUs |
Context Window | 2048 tokens |
Test Perplexity | 17.4 |
Paper | arXiv:2309.10931 |
Author | ai-forever (SberDevices team) |
What is rugpt3medium_based_on_gpt2?
rugpt3medium_based_on_gpt2 is a Russian language model based on the GPT-2 architecture, developed by the SberDevices team. It represents a significant advancement in Russian language processing, trained on a massive dataset of 80B tokens over 3 epochs.
Implementation Details
The model features a sophisticated training process that began with a 1024 token sequence length using the Transformers library, later fine-tuned to handle 2048 tokens. The extensive training period of 16 days on 64 GPUs demonstrates the computational intensity and thoroughness of the model's development.
- Initial pretraining with 1024 token sequence length
- Fine-tuned for 2048 token context window
- Implemented using the Transformers library
- Achieved 17.4 perplexity on test set
Core Capabilities
- Advanced Russian language understanding and generation
- Extended context window handling (2048 tokens)
- Optimized for Russian language tasks
- State-of-the-art performance metrics
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized focus on Russian language processing, extensive training on 80B tokens, and impressive perplexity score of 17.4, making it particularly effective for Russian language tasks.
Q: What are the recommended use cases?
The model is well-suited for Russian language processing tasks, including text generation, completion, and understanding. Its 2048 token context window makes it particularly effective for handling longer text sequences.