opus-mt-en-iir

Property	Value
Model Type	Transformer
Task	Machine Translation
Source Language	English
Target Languages	30+ Indo-Iranian languages
BLEU Score	13.7 (Tatoeba test)
Training Date	August 1, 2020
Model URL	Hugging Face

What is opus-mt-en-iir?

opus-mt-en-iir is a specialized machine translation model developed by Helsinki-NLP for translating English text into various Indo-Iranian languages. The model supports over 30 target languages including Hindi, Bengali, Persian, Gujarati, and many others. It uses a transformer architecture and implements SentencePiece tokenization with a 32k vocabulary.

Implementation Details

The model employs normalization and SentencePiece preprocessing, requiring a sentence-initial language token in the format >>id<< where id represents the target language identifier. It was trained on the OPUS corpus and demonstrates varying performance across different language pairs, with particularly strong results for Marathi (BLEU 20.7), Hindi (BLEU 17.0), and Bengali (BLEU 15.3).

Preprocessing: Normalization + SentencePiece (spm32k,spm32k)
Architecture: Transformer-based neural machine translation
Performance metrics: Overall BLEU score of 13.7 and chrF score of 0.392

Core Capabilities

Supports translation to major Indo-Iranian languages including Hindi, Bengali, Persian, and Gujarati
Handles multiple script systems including Devanagari, Arabic, Cyrillic, and Latin
Provides consistent performance across news and general domain content
Offers flexibility through language-specific tokens for target language selection

Frequently Asked Questions

Q: What makes this model unique?

This model's primary strength lies in its broad coverage of Indo-Iranian languages, supporting over 30 target languages with a single model. It's particularly useful for low-resource languages in this family, providing a practical solution for multilingual translation needs.

Q: What are the recommended use cases?

The model is best suited for general-purpose translation tasks from English to Indo-Iranian languages. It shows particularly strong performance for languages like Marathi, Hindi, and Bengali, making it ideal for content localization, document translation, and cross-lingual information access in South Asian contexts.

opus-mt-en-iir

opus-mt-en-iir

What is opus-mt-en-iir?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models