tr_core_news_trf
Property | Value |
---|---|
License | CC-BY-SA 4.0 |
Paper | ACL 2023 Paper |
Author | turkish-nlp-suite |
spaCy Version | >=3.4.2, <3.5.0 |
What is tr_core_news_trf?
tr_core_news_trf is a comprehensive Turkish language model built on transformer architecture, specifically designed for advanced Natural Language Processing tasks. It represents a significant milestone as one of the first spaCy models trained specifically for the Turkish language, offering state-of-the-art performance across multiple NLP tasks.
Implementation Details
The model implements a sophisticated pipeline including transformer, tagger, morphologizer, trainable lemmatizer, parser, and named entity recognition components. It's built upon the dbmdz Turkish BERT model (cased) and trained on diverse datasets including UD Turkish BOUN, Turkish Wiki NER dataset, and PANX/WikiANN.
- NER F-Score: 91.31%
- POS Tagging Accuracy: 91.74%
- Morphological Analysis Accuracy: 91.45%
- Lemmatization Accuracy: 87.82%
Core Capabilities
- Named Entity Recognition supporting 20 entity types including PERSON, ORG, GPE, and more
- Advanced morphological analysis specifically tuned for Turkish language
- Comprehensive dependency parsing with labeled attachment score of 71.89%
- Sentence boundary detection with 87.65% F-score
- Support for extensive tag set with over 1,500 morphological tags
Frequently Asked Questions
Q: What makes this model unique?
This model is one of the first comprehensive spaCy models for Turkish, offering state-of-the-art performance across multiple NLP tasks while being freely available for use. It's particularly notable for its extensive morphological analysis capabilities, which are crucial for processing Turkish language effectively.
Q: What are the recommended use cases?
The model is ideal for applications requiring detailed Turkish language analysis, including: text classification, named entity recognition, morphological analysis, dependency parsing, and general NLP pipelines for Turkish content. It's particularly suitable for academic research and industrial applications requiring deep linguistic analysis of Turkish text.