hu_core_news_lg

huspacy

Large Hungarian language model for NLP tasks with impressive accuracy scores (97.6% lemma, 96.7% POS tagging). Specialized for token classification.

Property	Value
Author	huspacy
License	CC BY-SA 4.0
Downloads	1,511

What is hu_core_news_lg?

hu_core_news_lg is a comprehensive Hungarian language model built on the spaCy framework, designed for advanced natural language processing tasks. This large-scale model excels in various token classification tasks, offering state-of-the-art performance for Hungarian text analysis.

Implementation Details

The model implements multiple NLP components, achieving remarkable accuracy across various linguistic tasks. It's built using spaCy's pipeline architecture, making it efficient for production environments.

Named Entity Recognition (NER): 86.9% F-score
Part-of-Speech Tagging: 96.6% accuracy
Lemmatization: 97.6% accuracy
Morphological Analysis: 93.4% accuracy
Dependency Parsing: 78.1% LAS

Core Capabilities

High-accuracy lemmatization and morphological analysis
Advanced named entity recognition
Robust sentence boundary detection (98.7% F-score)
Comprehensive POS tagging with both UPOS and XPOS support
Dependency parsing for syntactic analysis

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its comprehensive coverage of Hungarian language processing tasks, with particularly strong performance in lemmatization (97.6%) and POS tagging (96.7%). It's one of the most accurate publicly available models for Hungarian NLP.

Q: What are the recommended use cases?

The model is ideal for applications requiring detailed Hungarian text analysis, including: document classification, information extraction, linguistic research, and automated text processing systems. It's particularly strong in scenarios requiring accurate morphological analysis and lemmatization.