hu_core_news_md

huspacy

Hungarian language NLP model with strong performance in NER (85% F-score), POS tagging (97% accuracy), and lemmatization. Part of spaCy ecosystem.

Property	Value
License	CC BY-SA 4.0
Author	huspacy
Downloads	1,270

What is hu_core_news_md?

hu_core_news_md is a medium-sized Hungarian language model developed as part of the spaCy ecosystem. It's designed for comprehensive natural language processing tasks with impressive performance metrics across various linguistic analysis tasks.

Implementation Details

The model implements multiple token classification tasks, achieving remarkable accuracy across different linguistic aspects. It's built on spaCy's architecture, providing a robust foundation for Hungarian language processing.

Named Entity Recognition (NER): 84.78% F-score (Precision: 84.99%, Recall: 84.56%)
Part-of-Speech Tagging: 96.85% accuracy for Universal POS
Lemmatization: 97.41% accuracy
Morphological Analysis: 94.32% accuracy
Dependency Parsing: 74.25% LAS, 81.84% UAS

Core Capabilities

High-accuracy sentence boundary detection (98% F-score)
Advanced morphological feature recognition
Robust part-of-speech tagging with both XPOS (97.11%) and UPOS (96.85%) support
Comprehensive dependency parsing capabilities
Named entity recognition optimized for Hungarian text

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its comprehensive coverage of Hungarian language processing tasks with particularly high accuracy in lemmatization and POS tagging, making it ideal for detailed linguistic analysis of Hungarian text.

Q: What are the recommended use cases?

The model is well-suited for applications requiring detailed linguistic analysis of Hungarian text, including: text classification, information extraction, syntactic parsing, and named entity recognition in academic, business, or research contexts.