opus-mt-en-trk

opus-mt-en-trk

Helsinki-NLP

Neural machine translation model for English to Turkic languages, supporting 24+ language variants with BLEU scores ranging 0.1-34.6, built by Helsinki-NLP.

PropertyValue
LicenseApache 2.0
DeveloperHelsinki-NLP
ArchitectureTransformer
Training Date2020-08-01

What is opus-mt-en-trk?

opus-mt-en-trk is a specialized machine translation model designed to translate from English to various Turkic languages. Developed by Helsinki-NLP, this transformer-based model supports translation into 24+ language variants including Turkish, Azerbaijani, Kazakh, and Uzbek in different scripts (Latin, Cyrillic, and Arabic).

Implementation Details

The model utilizes a transformer architecture with SentencePiece tokenization (spm32k,spm32k) and requires a specific language token (>>id<<) at the beginning of input sentences to indicate the target language. The model was trained on the OPUS corpus and demonstrates varying performance across different Turkic languages, with Turkish (BLEU: 34.6) and Azerbaijani (BLEU: 26.8) showing the strongest results.

  • Preprocessing includes normalization and SentencePiece tokenization
  • Supports multiple script variants (Latin, Cyrillic, Arabic) for several languages
  • Trained on OPUS corpus with 2M sentence pairs
  • Implements language-specific tokens for target language selection

Core Capabilities

  • Multi-target translation supporting 24+ Turkic language variants
  • Handles both modern and historical Turkic languages (including Ottoman Turkish)
  • Best performance for Turkish (BLEU: 34.6) and Kyrgyz (BLEU: 28.6)
  • Supports different writing systems for the same language

Frequently Asked Questions

Q: What makes this model unique?

This model's unique feature is its ability to handle multiple Turkic languages and their script variants in a single model, making it a versatile tool for translation into the entire Turkic language family. The use of language tokens allows for dynamic target language selection.

Q: What are the recommended use cases?

The model is best suited for translating into major Turkic languages like Turkish, Azerbaijani, and Kyrgyz where it shows the highest BLEU scores. It's particularly useful for applications requiring translation into multiple Turkic languages, though performance varies significantly between languages.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026