opus-mt-en-trk

Maintained By
Helsinki-NLP

opus-mt-en-trk

PropertyValue
LicenseApache 2.0
DeveloperHelsinki-NLP
ArchitectureTransformer
Training Date2020-08-01

What is opus-mt-en-trk?

opus-mt-en-trk is a specialized machine translation model designed to translate from English to various Turkic languages. Developed by Helsinki-NLP, this transformer-based model supports translation into 24+ language variants including Turkish, Azerbaijani, Kazakh, and Uzbek in different scripts (Latin, Cyrillic, and Arabic).

Implementation Details

The model utilizes a transformer architecture with SentencePiece tokenization (spm32k,spm32k) and requires a specific language token (>>id<<) at the beginning of input sentences to indicate the target language. The model was trained on the OPUS corpus and demonstrates varying performance across different Turkic languages, with Turkish (BLEU: 34.6) and Azerbaijani (BLEU: 26.8) showing the strongest results.

  • Preprocessing includes normalization and SentencePiece tokenization
  • Supports multiple script variants (Latin, Cyrillic, Arabic) for several languages
  • Trained on OPUS corpus with 2M sentence pairs
  • Implements language-specific tokens for target language selection

Core Capabilities

  • Multi-target translation supporting 24+ Turkic language variants
  • Handles both modern and historical Turkic languages (including Ottoman Turkish)
  • Best performance for Turkish (BLEU: 34.6) and Kyrgyz (BLEU: 28.6)
  • Supports different writing systems for the same language

Frequently Asked Questions

Q: What makes this model unique?

This model's unique feature is its ability to handle multiple Turkic languages and their script variants in a single model, making it a versatile tool for translation into the entire Turkic language family. The use of language tokens allows for dynamic target language selection.

Q: What are the recommended use cases?

The model is best suited for translating into major Turkic languages like Turkish, Azerbaijani, and Kyrgyz where it shows the highest BLEU scores. It's particularly useful for applications requiring translation into multiple Turkic languages, though performance varies significantly between languages.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.