mGPT: Multilingual GPT Model
Property | Value |
---|---|
Parameters | 1.3 Billion |
License | Apache 2.0 |
Training Data | 488B UTF characters |
Languages | 61 |
Paper | arXiv:2204.07580 |
What is mGPT?
mGPT is a large-scale multilingual autoregressive language model that extends GPT-like architectures to support 61 languages from 25 language families. Developed by ai-forever, it represents a significant advancement in multilingual AI capabilities, trained on both Wikipedia and the Colossal Clean Crawled Corpus (MC4).
Implementation Details
The model was trained using a combination of Deepspeed and Megatron frameworks, enabling efficient parallel training across 256 Nvidia V100 GPUs over 14 days. The training process involved processing 440 billion BPE tokens with a sequence length of 512.
- Architecture based on GPT-3 design with GPT-2 sources
- Implements sparse attention mechanism
- Training corpus size: 488 billion UTF characters
- Optimized for both high-resource and low-resource languages
Core Capabilities
- Supports 61 languages including Arabic, English, Russian, Japanese, and many low-resource languages
- Performs on par with XGLM models while covering more languages
- Enables few-shot learning across multiple languages
- Suitable for text generation tasks in diverse languages
Frequently Asked Questions
Q: What makes this model unique?
mGPT stands out for its extensive language coverage (61 languages) and efficient training approach using Deepspeed and Megatron. It particularly enhances NLP capabilities for low-resource languages while maintaining competitive performance with similar multilingual models.
Q: What are the recommended use cases?
The model is ideal for multilingual text generation tasks, few-shot learning scenarios, and applications requiring natural language understanding across multiple languages. It's particularly valuable for organizations needing to process or generate text in multiple languages, including low-resource ones.