nllb-200-distilled-600M

Maintained By
facebook

NLLB-200 Distilled 600M

PropertyValue
LicenseCC-BY-NC 4.0
FrameworkPyTorch
TaskTranslation
Languages200 languages

What is nllb-200-distilled-600M?

NLLB-200-distilled-600M is a compressed version of Facebook's No Language Left Behind (NLLB) translation model, specifically designed to provide efficient multilingual translation capabilities across 200 languages. This model represents a significant breakthrough in making machine translation accessible for low-resource languages while maintaining reasonable computational requirements.

Implementation Details

The model utilizes a transformer-based architecture that has been distilled from a larger model to achieve better efficiency while maintaining translation quality. It supports translation between any pair of its 200 supported languages, with a particular focus on low-resource languages, especially African languages.

  • Maximum input length of 512 tokens
  • Trained on both parallel multilingual data and monolingual Common Crawl data
  • Evaluated using BLEU, spBLEU, and chrF++ metrics
  • Optimized for Wikimedia domain content

Core Capabilities

  • Direct translation between 200 languages
  • Support for multiple writing systems including Latin, Arabic, Cyrillic, and various Asian scripts
  • Specialized handling of low-resource language pairs
  • Single sentence translation optimization
  • Research-focused implementation with academic use cases in mind

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to handle 200 languages, including many low-resource languages, while maintaining a relatively compact size through distillation makes it unique. It's specifically designed for research purposes and provides unprecedented coverage of African and Asian languages.

Q: What are the recommended use cases?

The model is primarily intended for research in machine translation, especially focusing on low-resource languages. It's not recommended for production deployment, medical or legal translation, or document-length content. Best suited for academic research and experimental applications in machine translation.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.