mGPT-armenian
Property | Value |
---|---|
Parameters | 1.3 billion |
License | Apache 2.0 |
Paper | arXiv:2204.07580 |
Training Data | 170GB Armenian texts |
Architecture | GPT-3-based with sparse attention |
What is mGPT-armenian?
mGPT-armenian is a specialized monolingual GPT-3-based model designed specifically for the Armenian language. It builds upon the mGPT architecture, which was initially trained on 60 languages from 25 language families. The model represents a significant advancement in Armenian language processing, combining the power of large-scale language models with specialized training on Armenian text data.
Implementation Details
The model leverages both Deepspeed and Megatron frameworks for efficient training and inference. It was initially pre-trained for 12 days using 256 Tesla V100 GPUs for 4 epochs, followed by 9 days of training on 64 GPUs for 1 epoch. The Armenian fine-tuning process took approximately 7 days using 4 Tesla V100 GPUs, completing 160,000 steps.
- Training corpus includes MC4, Archive.org fiction, EANC public data, OpenSubtitles, and OSCAR corpus
- Achieved validation perplexity of 2.046
- Implements sparse attention mechanism with final tuning without sparsity
Core Capabilities
- High-quality Armenian text generation
- Advanced language understanding and processing
- Efficient performance through optimized attention mechanisms
- Comparable performance to XGLM models
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness lies in its specialized focus on Armenian language processing, combined with state-of-the-art architecture and extensive training on diverse Armenian text sources. It represents one of the first large-scale language models specifically optimized for Armenian.
Q: What are the recommended use cases?
The model is particularly suited for Armenian text generation tasks, natural language processing applications, and can be utilized for various downstream tasks requiring deep understanding of Armenian language patterns.