hu_core_news_md
Property | Value |
---|---|
License | CC BY-SA 4.0 |
Author | huspacy |
Downloads | 1,270 |
What is hu_core_news_md?
hu_core_news_md is a medium-sized Hungarian language model developed as part of the spaCy ecosystem. It's designed for comprehensive natural language processing tasks with impressive performance metrics across various linguistic analysis tasks.
Implementation Details
The model implements multiple token classification tasks, achieving remarkable accuracy across different linguistic aspects. It's built on spaCy's architecture, providing a robust foundation for Hungarian language processing.
- Named Entity Recognition (NER): 84.78% F-score (Precision: 84.99%, Recall: 84.56%)
- Part-of-Speech Tagging: 96.85% accuracy for Universal POS
- Lemmatization: 97.41% accuracy
- Morphological Analysis: 94.32% accuracy
- Dependency Parsing: 74.25% LAS, 81.84% UAS
Core Capabilities
- High-accuracy sentence boundary detection (98% F-score)
- Advanced morphological feature recognition
- Robust part-of-speech tagging with both XPOS (97.11%) and UPOS (96.85%) support
- Comprehensive dependency parsing capabilities
- Named entity recognition optimized for Hungarian text
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its comprehensive coverage of Hungarian language processing tasks with particularly high accuracy in lemmatization and POS tagging, making it ideal for detailed linguistic analysis of Hungarian text.
Q: What are the recommended use cases?
The model is well-suited for applications requiring detailed linguistic analysis of Hungarian text, including: text classification, information extraction, syntactic parsing, and named entity recognition in academic, business, or research contexts.