mGPT-armenian

Maintained By
ai-forever

mGPT-armenian

PropertyValue
Parameters1.3 billion
LicenseApache 2.0
PaperarXiv:2204.07580
Training Data170GB Armenian texts
ArchitectureGPT-3-based with sparse attention

What is mGPT-armenian?

mGPT-armenian is a specialized monolingual GPT-3-based model designed specifically for the Armenian language. It builds upon the mGPT architecture, which was initially trained on 60 languages from 25 language families. The model represents a significant advancement in Armenian language processing, combining the power of large-scale language models with specialized training on Armenian text data.

Implementation Details

The model leverages both Deepspeed and Megatron frameworks for efficient training and inference. It was initially pre-trained for 12 days using 256 Tesla V100 GPUs for 4 epochs, followed by 9 days of training on 64 GPUs for 1 epoch. The Armenian fine-tuning process took approximately 7 days using 4 Tesla V100 GPUs, completing 160,000 steps.

  • Training corpus includes MC4, Archive.org fiction, EANC public data, OpenSubtitles, and OSCAR corpus
  • Achieved validation perplexity of 2.046
  • Implements sparse attention mechanism with final tuning without sparsity

Core Capabilities

  • High-quality Armenian text generation
  • Advanced language understanding and processing
  • Efficient performance through optimized attention mechanisms
  • Comparable performance to XGLM models

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its specialized focus on Armenian language processing, combined with state-of-the-art architecture and extensive training on diverse Armenian text sources. It represents one of the first large-scale language models specifically optimized for Armenian.

Q: What are the recommended use cases?

The model is particularly suited for Armenian text generation tasks, natural language processing applications, and can be utilized for various downstream tasks requiring deep understanding of Armenian language patterns.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.