mGPT: Multilingual GPT Model

Property	Value
Parameters	1.3 Billion
License	Apache 2.0
Training Data	488B UTF characters
Languages	61
Paper	arXiv:2204.07580

What is mGPT?

mGPT is a large-scale multilingual autoregressive language model that extends GPT-like architectures to support 61 languages from 25 language families. Developed by ai-forever, it represents a significant advancement in multilingual AI capabilities, trained on both Wikipedia and the Colossal Clean Crawled Corpus (MC4).

Implementation Details

The model was trained using a combination of Deepspeed and Megatron frameworks, enabling efficient parallel training across 256 Nvidia V100 GPUs over 14 days. The training process involved processing 440 billion BPE tokens with a sequence length of 512.

Architecture based on GPT-3 design with GPT-2 sources
Implements sparse attention mechanism
Training corpus size: 488 billion UTF characters
Optimized for both high-resource and low-resource languages

Core Capabilities

Supports 61 languages including Arabic, English, Russian, Japanese, and many low-resource languages
Performs on par with XGLM models while covering more languages
Enables few-shot learning across multiple languages
Suitable for text generation tasks in diverse languages

Frequently Asked Questions

Q: What makes this model unique?

mGPT stands out for its extensive language coverage (61 languages) and efficient training approach using Deepspeed and Megatron. It particularly enhances NLP capabilities for low-resource languages while maintaining competitive performance with similar multilingual models.

Q: What are the recommended use cases?

The model is ideal for multilingual text generation tasks, few-shot learning scenarios, and applications requiring natural language understanding across multiple languages. It's particularly valuable for organizations needing to process or generate text in multiple languages, including low-resource ones.

mGPT