opus-tatoeba-en-tr

Property	Value
Model Type	Transformer-align
Languages	English → Turkish
Training Data	OPUS + Tatoeba
Release Date	April 10, 2021
Best BLEU Score	41.5 (Tatoeba test set)

What is opus-tatoeba-en-tr?

The opus-tatoeba-en-tr is a specialized machine translation model developed by Helsinki-NLP for translating between English and Turkish. Built using the transformer-align architecture, this model has demonstrated strong performance particularly on the Tatoeba test set, achieving a BLEU score of 41.5 and a chrF score of 0.684.

Implementation Details

The model implements a transformer-align architecture with specialized preprocessing that includes normalization and SentencePiece tokenization (spm32k,spm32k). It was trained on the OPUS dataset supplemented with back-translated data, as indicated by the '+bt' in the model version.

Pre-processing: Normalization + SentencePiece (32k vocabulary)
Architecture: Transformer-align
Training Data: OPUS corpus + back-translated data
Evaluation: Multiple test sets including news and Tatoeba

Core Capabilities

High-quality English to Turkish translation
Strong performance on general domain text (Tatoeba: 41.5 BLEU)
Consistent performance on news domain (News Test 2017: 22.8 BLEU)
Robust handling of various text types

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized focus on English-Turkish translation and its impressive performance on the Tatoeba test set. The combination of transformer-align architecture with careful preprocessing yields particularly strong results for this language pair.

Q: What are the recommended use cases?

The model is well-suited for general-purpose English to Turkish translation, showing particularly strong performance on everyday language (as evidenced by Tatoeba scores) and reasonable performance on news content. It's ideal for applications requiring reliable English-Turkish translation capabilities.