ru_core_news_sm

Maintained By
spacy

ru_core_news_sm

PropertyValue
Version3.7.0
LicenseMIT
AuthorExplosion
SourceNerus (Alexander Kukushkin)
spaCy Compatibility>=3.7.0,<3.8.0

What is ru_core_news_sm?

ru_core_news_sm is a lightweight Russian language model optimized for CPU usage, developed by Explosion for the spaCy framework. This model is designed to handle core natural language processing tasks for Russian text with impressive accuracy while maintaining minimal resource requirements.

Implementation Details

The model implements a sophisticated pipeline architecture consisting of multiple components: tok2vec, morphologizer, parser, senter, attribute_ruler, lemmatizer, and named entity recognition (NER). Each component is specifically trained to handle Russian language characteristics, with the model achieving remarkable accuracy scores across various metrics.

  • Token Accuracy: 99.68% accuracy in tokenization
  • POS Tagging: 98.77% accuracy in part-of-speech tagging
  • Dependency Parsing: 94.62% labeled attachment score (LAS)
  • Named Entity Recognition: 94.98% F-score

Core Capabilities

  • Morphological Analysis with extensive Russian language features
  • Named Entity Recognition for identifying PER (persons), ORG (organizations), and LOC (locations)
  • Dependency Parsing with 40 distinct dependency labels
  • Sentence Segmentation with 99.89% F-score
  • Part-of-Speech Tagging with detailed morphological features

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its exceptional accuracy despite its small size, making it ideal for production environments where resource efficiency is crucial. It's particularly noteworthy for its high performance in morphological analysis, which is essential for Russian language processing.

Q: What are the recommended use cases?

This model is ideal for applications requiring Russian language processing tasks such as text classification, information extraction, and linguistic analysis. It's particularly suitable for production environments where processing speed and resource efficiency are important considerations.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.