berturk-sunlp-ner-turkish
Property | Value |
---|---|
Author | busecarik |
Framework | PyTorch, Transformers |
Task | Named Entity Recognition |
Language | Turkish |
Dataset | SUNLP-NER-Twitter (5000 tweets) |
Overall F1 Score | 82.69% |
What is berturk-sunlp-ner-turkish?
berturk-sunlp-ner-turkish is a specialized Named Entity Recognition (NER) model fine-tuned from the BERTurk-cased model specifically for Turkish language processing. It's designed to identify and classify seven different types of entities in Turkish text: Person, Location, Organization, Time, Money, Product, and TV-Show.
Implementation Details
The model is implemented using the Transformers library and can be easily integrated using HuggingFace's infrastructure. It demonstrates impressive performance metrics, with particularly strong results in Person (91% F1) and Time (89% F1) entity recognition.
- Built on BERTurk-cased architecture
- Trained on 5000 Turkish tweets
- Supports 7 entity types
- Achieves 82.96% precision and 82.42% recall overall
Core Capabilities
- Person detection (90% precision, 91% recall)
- Location identification (70% precision, 80% recall)
- Organization recognition (78% precision, 86% recall)
- Time expression detection (94% precision, 85% recall)
- Money amount recognition (80% precision, 71% recall)
- Product identification (44% precision, 47% recall)
- TV Show detection (61% precision, 35% recall)
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically designed for Turkish social media text, making it particularly effective for processing informal language and Twitter content. It's one of the few models trained on a substantial dataset of Turkish tweets with comprehensive entity coverage.
Q: What are the recommended use cases?
The model is ideal for Turkish social media analysis, information extraction from tweets, and general Turkish text processing where entity recognition is required. It's particularly strong in identifying person names and time expressions, making it valuable for social media monitoring and analysis.