MT0-Large Model
Property | Value |
---|---|
Parameter Count | 1.23B |
License | Apache 2.0 |
Paper | Crosslingual Generalization through Multitask Finetuning |
Supported Languages | 101 |
What is mt0-large?
MT0-Large is a powerful multilingual text-to-text transformer model that belongs to the BLOOMZ & mT0 family. It's specifically designed for following human instructions in dozens of languages zero-shot, achieved through finetuning on the crosslingual task mixture (xP3). The model represents a significant advancement in multilingual AI capabilities, with 1.23 billion parameters trained across 101 languages.
Implementation Details
The model is built on the MT5 architecture and has been finetuned for 25,000 steps using 4.62 billion tokens. Training was conducted using TPUv4-64 hardware with bfloat16 precision, orchestrated through T5X and implemented in Jax.
- Architecture: Based on MT5-Large architecture
- Training Infrastructure: TPUv4-64
- Precision: bfloat16
- Framework: Jax with T5X orchestration
Core Capabilities
- Zero-shot cross-lingual task generalization
- Multilingual instruction following
- Text-to-text generation across 101 languages
- Support for various tasks including translation, sentiment analysis, and query generation
Frequently Asked Questions
Q: What makes this model unique?
MT0-Large's ability to perform zero-shot cross-lingual generalization and follow instructions in multiple languages makes it stand out. It can understand and generate text across 101 languages without requiring specific training for each new task or language.
Q: What are the recommended use cases?
The model excels at tasks expressed in natural language, including translation, sentiment analysis, query generation, and story writing. It's particularly effective when given clear, well-structured prompts that end with proper punctuation.