mGPT

Maintained By
ai-forever

mGPT: Multilingual GPT Model

PropertyValue
Parameters1.3 Billion
LicenseApache 2.0
Training Data488B UTF characters
Languages61
PaperarXiv:2204.07580

What is mGPT?

mGPT is a large-scale multilingual autoregressive language model that extends GPT-like architectures to support 61 languages from 25 language families. Developed by ai-forever, it represents a significant advancement in multilingual AI capabilities, trained on both Wikipedia and the Colossal Clean Crawled Corpus (MC4).

Implementation Details

The model was trained using a combination of Deepspeed and Megatron frameworks, enabling efficient parallel training across 256 Nvidia V100 GPUs over 14 days. The training process involved processing 440 billion BPE tokens with a sequence length of 512.

  • Architecture based on GPT-3 design with GPT-2 sources
  • Implements sparse attention mechanism
  • Training corpus size: 488 billion UTF characters
  • Optimized for both high-resource and low-resource languages

Core Capabilities

  • Supports 61 languages including Arabic, English, Russian, Japanese, and many low-resource languages
  • Performs on par with XGLM models while covering more languages
  • Enables few-shot learning across multiple languages
  • Suitable for text generation tasks in diverse languages

Frequently Asked Questions

Q: What makes this model unique?

mGPT stands out for its extensive language coverage (61 languages) and efficient training approach using Deepspeed and Megatron. It particularly enhances NLP capabilities for low-resource languages while maintaining competitive performance with similar multilingual models.

Q: What are the recommended use cases?

The model is ideal for multilingual text generation tasks, few-shot learning scenarios, and applications requiring natural language understanding across multiple languages. It's particularly valuable for organizations needing to process or generate text in multiple languages, including low-resource ones.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.