opus-mt-en-iir

Maintained By
Helsinki-NLP

opus-mt-en-iir

PropertyValue
Model TypeTransformer
TaskMachine Translation
Source LanguageEnglish
Target Languages30+ Indo-Iranian languages
BLEU Score13.7 (Tatoeba test)
Training DateAugust 1, 2020
Model URLHugging Face

What is opus-mt-en-iir?

opus-mt-en-iir is a specialized machine translation model developed by Helsinki-NLP for translating English text into various Indo-Iranian languages. The model supports over 30 target languages including Hindi, Bengali, Persian, Gujarati, and many others. It uses a transformer architecture and implements SentencePiece tokenization with a 32k vocabulary.

Implementation Details

The model employs normalization and SentencePiece preprocessing, requiring a sentence-initial language token in the format >>id<< where id represents the target language identifier. It was trained on the OPUS corpus and demonstrates varying performance across different language pairs, with particularly strong results for Marathi (BLEU 20.7), Hindi (BLEU 17.0), and Bengali (BLEU 15.3).

  • Preprocessing: Normalization + SentencePiece (spm32k,spm32k)
  • Architecture: Transformer-based neural machine translation
  • Performance metrics: Overall BLEU score of 13.7 and chrF score of 0.392

Core Capabilities

  • Supports translation to major Indo-Iranian languages including Hindi, Bengali, Persian, and Gujarati
  • Handles multiple script systems including Devanagari, Arabic, Cyrillic, and Latin
  • Provides consistent performance across news and general domain content
  • Offers flexibility through language-specific tokens for target language selection

Frequently Asked Questions

Q: What makes this model unique?

This model's primary strength lies in its broad coverage of Indo-Iranian languages, supporting over 30 target languages with a single model. It's particularly useful for low-resource languages in this family, providing a practical solution for multilingual translation needs.

Q: What are the recommended use cases?

The model is best suited for general-purpose translation tasks from English to Indo-Iranian languages. It shows particularly strong performance for languages like Marathi, Hindi, and Bengali, making it ideal for content localization, document translation, and cross-lingual information access in South Asian contexts.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.