TURNA
Property | Value |
---|---|
Parameter Count | 1.14B |
Model Type | Text2Text Generation |
Architecture | Transformer-based encoder-decoder |
License | Non-commercial academic research only |
Paper | arXiv:2401.14373 |
What is TURNA?
TURNA is an advanced Turkish language model developed by Bogazici University Computer Engineering Department TABILAB. Based on the UL2 framework, it represents a significant advancement in Turkish natural language processing, featuring 36 encoder-decoder layers and 16 attention heads. The model was trained on a diverse corpus including OSCAR, OPUS, and Wikipedia, making it particularly effective for both understanding and generation tasks.
Implementation Details
The model architecture is built with precision and scale in mind, featuring 1024-dimensional token embeddings and 2816-dimensional multi-layer perceptron layers with Gated GeLu activations. TURNA utilizes a unigram subword tokenizer trained on 10GB of text data, with a vocabulary size of 32,128 tokens (including special tokens).
- Training completed over 1,740,000 steps with a batch size of 48
- Input and output lengths of 512 tokens
- Effectively processed 42.7B tokens during training
- Trained on TPU v3-8 machines through Google TPU Research Cloud
Core Capabilities
- Text generation and understanding in Turkish
- Paraphrasing and summarization
- Named Entity Recognition
- Part of Speech tagging
- Semantic Textual Similarity
- Natural language inference
- Text classification
Frequently Asked Questions
Q: What makes this model unique?
TURNA stands out for its specialized focus on Turkish language processing, outperforming multilingual models and competing with monolingual Turkish models in understanding tasks. Its architecture and training approach make it particularly effective for both comprehension and generation tasks.
Q: What are the recommended use cases?
The model is specifically designed for non-commercial academic research purposes, particularly suited for tasks involving Turkish language processing, including text generation, summarization, and various NLP tasks. It can be fine-tuned using the provided library for custom Turkish language tasks.