opus-mt-tc-big-en-tr

Maintained By
Helsinki-NLP

opus-mt-tc-big-en-tr

PropertyValue
Model TypeNeural Machine Translation
ArchitectureTransformer-big
LanguagesEnglish to Turkish
Release Date2022-02-25
PaperOPUS-MT Paper
Best BLEU Score42.3 (Tatoeba test set)

What is opus-mt-tc-big-en-tr?

opus-mt-tc-big-en-tr is a state-of-the-art neural machine translation model specifically designed for English to Turkish translation. Developed by Helsinki-NLP as part of the OPUS-MT project, this model leverages the transformer-big architecture and is trained on the comprehensive OPUS dataset with additional back-translation data (opusTCv20210807+bt).

Implementation Details

The model utilizes SentencePiece tokenization with a 32k vocabulary for both source and target languages. Originally trained using Marian NMT framework, it has been converted to PyTorch using the Hugging Face transformers library for broader accessibility. The model demonstrates strong performance across various test sets, particularly excelling on the Tatoeba benchmark.

  • Tokenization: SentencePiece (spm32k,spm32k)
  • Framework: PyTorch (converted from Marian NMT)
  • Training Data: OPUS TC v2021-08-07 with back-translation

Core Capabilities

  • High-quality English to Turkish translation
  • Achieves 42.3 BLEU score on Tatoeba test set
  • Strong performance on news translation (25.4 BLEU on newstest2017)
  • Easily integrable with Hugging Face transformers pipeline
  • Supports batch translation

Frequently Asked Questions

Q: What makes this model unique?

This model is part of a larger initiative to make high-quality machine translation accessible for many language pairs. It specifically excels in English-Turkish translation, using a transformer-big architecture and comprehensive training data from OPUS.

Q: What are the recommended use cases?

The model is ideal for English to Turkish translation tasks in production environments, particularly for general domain content. It shows strong performance on news translation and everyday language (as evidenced by Tatoeba test results), making it suitable for both professional and academic applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.