opus-mt-tc-big-el-en
Property | Value |
---|---|
Architecture | Transformer-big |
Source Language | Modern Greek (1453-) |
Target Language | English |
Release Date | 2022-02-25 |
Tokenization | SentencePiece (32k) |
BLEU Score | 68.8 (Tatoeba test) |
Paper | OPUS-MT Paper |
What is opus-mt-tc-big-el-en?
opus-mt-tc-big-el-en is a neural machine translation model specifically designed for translating Modern Greek text to English. Developed by Helsinki-NLP as part of the OPUS-MT project, this model leverages the transformer-big architecture and was trained on the comprehensive OPUS corpus with additional back-translated data (opusTCv20210807+bt).
Implementation Details
The model is implemented using the Marian NMT framework and has been converted to PyTorch using the Hugging Face transformers library. It utilizes SentencePiece tokenization with a vocabulary size of 32k for both source and target languages.
- Built on transformer-big architecture for enhanced performance
- Uses SentencePiece tokenization (32k vocabulary)
- Trained on OPUS corpus with back-translation augmentation
- Achieves 68.8 BLEU score on Tatoeba test set
- Shows 33.9 BLEU score on flores101-devtest
Core Capabilities
- High-quality Greek to English translation
- Supports batch translation
- Compatible with Hugging Face transformers pipeline
- Optimized for production deployment
- Excellent performance on general-domain text
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its impressive performance on Greek-to-English translation, achieving a notable 68.8 BLEU score on the Tatoeba test set. It's part of the larger OPUS-MT initiative to democratize machine translation across many languages.
Q: What are the recommended use cases?
The model is ideal for translating Modern Greek text to English in various applications, including content localization, document translation, and automated translation services. It's particularly effective for general-domain text as demonstrated by its strong performance on standardized test sets.