ru_core_news_lg
Property | Value |
---|---|
License | MIT |
Vector Dimensions | 300 |
Vocabulary Size | 500,002 keys |
spaCy Version | >=3.7.0,<3.8.0 |
What is ru_core_news_lg?
ru_core_news_lg is a comprehensive Russian language model optimized for CPU usage, developed by Explosion AI. It's built on spaCy and incorporates multiple components for advanced natural language processing tasks. The model demonstrates exceptional accuracy across various metrics, including 95.24% precision in Named Entity Recognition (NER) and 98.93% accuracy in Part-of-Speech (POS) tagging.
Implementation Details
The model architecture consists of several key components: tok2vec, morphologizer, parser, senter, attribute_ruler, lemmatizer, and NER. It utilizes 500,002 unique vectors with 300 dimensions, drawing from the Nerus and Navec datasets developed by Alexander Kukushkin.
- Token Classification Accuracy: 99.68%
- Morphological Analysis Accuracy: 97.49%
- Dependency Parsing (LAS): 95.12%
- Sentence Boundary Detection: 99.86% F-score
Core Capabilities
- Named Entity Recognition for LOC, ORG, and PER entities
- Advanced morphological analysis with 900+ label combinations
- Comprehensive dependency parsing with 40 label types
- High-accuracy sentence segmentation
- Token classification with extensive feature support
Frequently Asked Questions
Q: What makes this model unique?
The model's exceptional accuracy across multiple tasks, combined with its comprehensive Russian language support and extensive vocabulary (500,002 vectors), makes it particularly valuable for production deployments. It's optimized for CPU usage, making it accessible for various deployment scenarios.
Q: What are the recommended use cases?
The model is ideal for advanced Russian text analysis tasks including: Named Entity Recognition, syntactic parsing, morphological analysis, and sentence segmentation. It's particularly suited for applications requiring high accuracy in Russian language processing, such as content analysis, information extraction, and text classification.