opus-tatoeba-en-tr
Property | Value |
---|---|
Model Type | Transformer-align |
Languages | English → Turkish |
Training Data | OPUS + Tatoeba |
Release Date | April 10, 2021 |
Best BLEU Score | 41.5 (Tatoeba test set) |
What is opus-tatoeba-en-tr?
The opus-tatoeba-en-tr is a specialized machine translation model developed by Helsinki-NLP for translating between English and Turkish. Built using the transformer-align architecture, this model has demonstrated strong performance particularly on the Tatoeba test set, achieving a BLEU score of 41.5 and a chrF score of 0.684.
Implementation Details
The model implements a transformer-align architecture with specialized preprocessing that includes normalization and SentencePiece tokenization (spm32k,spm32k). It was trained on the OPUS dataset supplemented with back-translated data, as indicated by the '+bt' in the model version.
- Pre-processing: Normalization + SentencePiece (32k vocabulary)
- Architecture: Transformer-align
- Training Data: OPUS corpus + back-translated data
- Evaluation: Multiple test sets including news and Tatoeba
Core Capabilities
- High-quality English to Turkish translation
- Strong performance on general domain text (Tatoeba: 41.5 BLEU)
- Consistent performance on news domain (News Test 2017: 22.8 BLEU)
- Robust handling of various text types
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized focus on English-Turkish translation and its impressive performance on the Tatoeba test set. The combination of transformer-align architecture with careful preprocessing yields particularly strong results for this language pair.
Q: What are the recommended use cases?
The model is well-suited for general-purpose English to Turkish translation, showing particularly strong performance on everyday language (as evidenced by Tatoeba scores) and reasonable performance on news content. It's ideal for applications requiring reliable English-Turkish translation capabilities.