mGPT

mGPT

ai-forever

Multilingual GPT model with 1.3B parameters supporting 61 languages, trained on Wikipedia and MC4 corpus using Deepspeed and Megatron frameworks.

PropertyValue
Parameters1.3 Billion
LicenseApache 2.0
Training Data488B UTF characters
Languages61
PaperarXiv:2204.07580

What is mGPT?

mGPT is a large-scale multilingual autoregressive language model that extends GPT-like architectures to support 61 languages from 25 language families. Developed by ai-forever, it represents a significant advancement in multilingual AI capabilities, trained on both Wikipedia and the Colossal Clean Crawled Corpus (MC4).

Implementation Details

The model was trained using a combination of Deepspeed and Megatron frameworks, enabling efficient parallel training across 256 Nvidia V100 GPUs over 14 days. The training process involved processing 440 billion BPE tokens with a sequence length of 512.

  • Architecture based on GPT-3 design with GPT-2 sources
  • Implements sparse attention mechanism
  • Training corpus size: 488 billion UTF characters
  • Optimized for both high-resource and low-resource languages

Core Capabilities

  • Supports 61 languages including Arabic, English, Russian, Japanese, and many low-resource languages
  • Performs on par with XGLM models while covering more languages
  • Enables few-shot learning across multiple languages
  • Suitable for text generation tasks in diverse languages

Frequently Asked Questions

Q: What makes this model unique?

mGPT stands out for its extensive language coverage (61 languages) and efficient training approach using Deepspeed and Megatron. It particularly enhances NLP capabilities for low-resource languages while maintaining competitive performance with similar multilingual models.

Q: What are the recommended use cases?

The model is ideal for multilingual text generation tasks, few-shot learning scenarios, and applications requiring natural language understanding across multiple languages. It's particularly valuable for organizations needing to process or generate text in multiple languages, including low-resource ones.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026