mGPT-armenian

mGPT-armenian

ai-forever

Armenian language GPT model with 1.3B parameters, trained on 170GB text data. Built on mGPT architecture with sparse attention mechanism.

PropertyValue
Parameters1.3 billion
LicenseApache 2.0
PaperarXiv:2204.07580
Training Data170GB Armenian texts
ArchitectureGPT-3-based with sparse attention

What is mGPT-armenian?

mGPT-armenian is a specialized monolingual GPT-3-based model designed specifically for the Armenian language. It builds upon the mGPT architecture, which was initially trained on 60 languages from 25 language families. The model represents a significant advancement in Armenian language processing, combining the power of large-scale language models with specialized training on Armenian text data.

Implementation Details

The model leverages both Deepspeed and Megatron frameworks for efficient training and inference. It was initially pre-trained for 12 days using 256 Tesla V100 GPUs for 4 epochs, followed by 9 days of training on 64 GPUs for 1 epoch. The Armenian fine-tuning process took approximately 7 days using 4 Tesla V100 GPUs, completing 160,000 steps.

  • Training corpus includes MC4, Archive.org fiction, EANC public data, OpenSubtitles, and OSCAR corpus
  • Achieved validation perplexity of 2.046
  • Implements sparse attention mechanism with final tuning without sparsity

Core Capabilities

  • High-quality Armenian text generation
  • Advanced language understanding and processing
  • Efficient performance through optimized attention mechanisms
  • Comparable performance to XGLM models

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its specialized focus on Armenian language processing, combined with state-of-the-art architecture and extensive training on diverse Armenian text sources. It represents one of the first large-scale language models specifically optimized for Armenian.

Q: What are the recommended use cases?

The model is particularly suited for Armenian text generation tasks, natural language processing applications, and can be utilized for various downstream tasks requiring deep understanding of Armenian language patterns.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026