Aya-101
Property | Value |
---|---|
Parameter Count | 12.9B |
Model Type | Text2Text Generation |
Architecture | T5-based Transformer |
License | Apache-2.0 |
Paper | Aya Model Paper |
What is aya-101?
Aya-101 is a groundbreaking multilingual language model developed by CohereForAI that supports instruction-following capabilities across 101 languages. Built on the T5 architecture, it represents a significant advancement in multilingual AI, outperforming existing models like mT0 and BLOOMZ while covering twice the number of languages.
Implementation Details
The model is implemented using the T5X framework and JAX, trained on TPUv4-128 hardware with a batch size of 256. It processes 25M samples during fine-tuning and utilizes multiple datasets including xP3x, Aya Dataset, and Aya Collection.
- Architecture based on mt5-xxl
- Trained on 5 curated datasets
- Supports 101 languages across various scripts and families
- Uses SafeTensors format for improved security
Core Capabilities
- Multilingual instruction following
- Cross-lingual translation
- Text generation across diverse scripts
- Support for low-resource languages
- Enhanced safety features and bias mitigation
Frequently Asked Questions
Q: What makes this model unique?
Aya-101 stands out for its extensive language coverage (101 languages) and superior performance in both automatic and human evaluations. It's particularly notable for supporting low-resource languages while maintaining high performance standards.
Q: What are the recommended use cases?
The model excels in multilingual text generation, translation, and instruction-following tasks. It's particularly valuable for applications requiring broad language support or working with low-resource languages.