pl_core_news_lg
Property | Value |
---|---|
License | GPL 3.0 |
Vector Dimensions | 300 |
Vocabulary Size | 500,000 keys |
spaCy Version | >=3.7.0,<3.8.0 |
What is pl_core_news_lg?
pl_core_news_lg is a comprehensive Polish language model developed for the spaCy framework, optimized for CPU usage. It represents a sophisticated natural language processing tool that combines high accuracy with extensive functionality for Polish text analysis.
Implementation Details
The model is built with a robust pipeline architecture including tok2vec, morphologizer, parser, lemmatizer, tagger, senter, and named entity recognition components. It features 500,000 unique word vectors with 300 dimensions, trained on a combination of the National Corpus of Polish, UD Polish PDB, and Explosion fastText vectors.
- Named Entity Recognition: 84.74% precision, 83.56% recall
- POS Tagging: 98.29% accuracy
- Morphological Analysis: 90.98% accuracy
- Dependency Parsing: 89.50% UAS, 82.38% LAS
Core Capabilities
- Advanced morphological analysis with support for complex Polish grammar
- Comprehensive named entity recognition for dates, geographic names, organizations, and person names
- High-accuracy lemmatization (94.25%) and sentence segmentation (96.31% F-score)
- Extensive dependency parsing with support for 63 dependency relations
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its comprehensive coverage of Polish language features, combining high accuracy across multiple NLP tasks with extensive vocabulary coverage and detailed morphological analysis capabilities specifically designed for Polish language complexities.
Q: What are the recommended use cases?
This model is ideal for advanced Polish text analysis tasks including detailed linguistic analysis, information extraction, text classification, and natural language understanding applications requiring deep grammatical and semantic processing of Polish text.