tr_core_news_trf

Maintained By
turkish-nlp-suite

tr_core_news_trf

PropertyValue
LicenseCC-BY-SA 4.0
PaperACL 2023 Paper
Authorturkish-nlp-suite
spaCy Version>=3.4.2, <3.5.0

What is tr_core_news_trf?

tr_core_news_trf is a comprehensive Turkish language model built on transformer architecture, specifically designed for advanced Natural Language Processing tasks. It represents a significant milestone as one of the first spaCy models trained specifically for the Turkish language, offering state-of-the-art performance across multiple NLP tasks.

Implementation Details

The model implements a sophisticated pipeline including transformer, tagger, morphologizer, trainable lemmatizer, parser, and named entity recognition components. It's built upon the dbmdz Turkish BERT model (cased) and trained on diverse datasets including UD Turkish BOUN, Turkish Wiki NER dataset, and PANX/WikiANN.

  • NER F-Score: 91.31%
  • POS Tagging Accuracy: 91.74%
  • Morphological Analysis Accuracy: 91.45%
  • Lemmatization Accuracy: 87.82%

Core Capabilities

  • Named Entity Recognition supporting 20 entity types including PERSON, ORG, GPE, and more
  • Advanced morphological analysis specifically tuned for Turkish language
  • Comprehensive dependency parsing with labeled attachment score of 71.89%
  • Sentence boundary detection with 87.65% F-score
  • Support for extensive tag set with over 1,500 morphological tags

Frequently Asked Questions

Q: What makes this model unique?

This model is one of the first comprehensive spaCy models for Turkish, offering state-of-the-art performance across multiple NLP tasks while being freely available for use. It's particularly notable for its extensive morphological analysis capabilities, which are crucial for processing Turkish language effectively.

Q: What are the recommended use cases?

The model is ideal for applications requiring detailed Turkish language analysis, including: text classification, named entity recognition, morphological analysis, dependency parsing, and general NLP pipelines for Turkish content. It's particularly suitable for academic research and industrial applications requiring deep linguistic analysis of Turkish text.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.