ru_core_news_sm
Property | Value |
---|---|
Version | 3.7.0 |
License | MIT |
Author | Explosion |
Source | Nerus (Alexander Kukushkin) |
spaCy Compatibility | >=3.7.0,<3.8.0 |
What is ru_core_news_sm?
ru_core_news_sm is a lightweight Russian language model optimized for CPU usage, developed by Explosion for the spaCy framework. This model is designed to handle core natural language processing tasks for Russian text with impressive accuracy while maintaining minimal resource requirements.
Implementation Details
The model implements a sophisticated pipeline architecture consisting of multiple components: tok2vec, morphologizer, parser, senter, attribute_ruler, lemmatizer, and named entity recognition (NER). Each component is specifically trained to handle Russian language characteristics, with the model achieving remarkable accuracy scores across various metrics.
- Token Accuracy: 99.68% accuracy in tokenization
- POS Tagging: 98.77% accuracy in part-of-speech tagging
- Dependency Parsing: 94.62% labeled attachment score (LAS)
- Named Entity Recognition: 94.98% F-score
Core Capabilities
- Morphological Analysis with extensive Russian language features
- Named Entity Recognition for identifying PER (persons), ORG (organizations), and LOC (locations)
- Dependency Parsing with 40 distinct dependency labels
- Sentence Segmentation with 99.89% F-score
- Part-of-Speech Tagging with detailed morphological features
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its exceptional accuracy despite its small size, making it ideal for production environments where resource efficiency is crucial. It's particularly noteworthy for its high performance in morphological analysis, which is essential for Russian language processing.
Q: What are the recommended use cases?
This model is ideal for applications requiring Russian language processing tasks such as text classification, information extraction, and linguistic analysis. It's particularly suitable for production environments where processing speed and resource efficiency are important considerations.