mt0-large

Maintained By
bigscience

MT0-Large Model

PropertyValue
Parameter Count1.23B
LicenseApache 2.0
PaperCrosslingual Generalization through Multitask Finetuning
Supported Languages101

What is mt0-large?

MT0-Large is a powerful multilingual text-to-text transformer model that belongs to the BLOOMZ & mT0 family. It's specifically designed for following human instructions in dozens of languages zero-shot, achieved through finetuning on the crosslingual task mixture (xP3). The model represents a significant advancement in multilingual AI capabilities, with 1.23 billion parameters trained across 101 languages.

Implementation Details

The model is built on the MT5 architecture and has been finetuned for 25,000 steps using 4.62 billion tokens. Training was conducted using TPUv4-64 hardware with bfloat16 precision, orchestrated through T5X and implemented in Jax.

  • Architecture: Based on MT5-Large architecture
  • Training Infrastructure: TPUv4-64
  • Precision: bfloat16
  • Framework: Jax with T5X orchestration

Core Capabilities

  • Zero-shot cross-lingual task generalization
  • Multilingual instruction following
  • Text-to-text generation across 101 languages
  • Support for various tasks including translation, sentiment analysis, and query generation

Frequently Asked Questions

Q: What makes this model unique?

MT0-Large's ability to perform zero-shot cross-lingual generalization and follow instructions in multiple languages makes it stand out. It can understand and generate text across 101 languages without requiring specific training for each new task or language.

Q: What are the recommended use cases?

The model excels at tasks expressed in natural language, including translation, sentiment analysis, query generation, and story writing. It's particularly effective when given clear, well-structured prompts that end with proper punctuation.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.