ru_core_news_lg

Maintained By
spacy

ru_core_news_lg

PropertyValue
LicenseMIT
Vector Dimensions300
Vocabulary Size500,002 keys
spaCy Version>=3.7.0,<3.8.0

What is ru_core_news_lg?

ru_core_news_lg is a comprehensive Russian language model optimized for CPU usage, developed by Explosion AI. It's built on spaCy and incorporates multiple components for advanced natural language processing tasks. The model demonstrates exceptional accuracy across various metrics, including 95.24% precision in Named Entity Recognition (NER) and 98.93% accuracy in Part-of-Speech (POS) tagging.

Implementation Details

The model architecture consists of several key components: tok2vec, morphologizer, parser, senter, attribute_ruler, lemmatizer, and NER. It utilizes 500,002 unique vectors with 300 dimensions, drawing from the Nerus and Navec datasets developed by Alexander Kukushkin.

  • Token Classification Accuracy: 99.68%
  • Morphological Analysis Accuracy: 97.49%
  • Dependency Parsing (LAS): 95.12%
  • Sentence Boundary Detection: 99.86% F-score

Core Capabilities

  • Named Entity Recognition for LOC, ORG, and PER entities
  • Advanced morphological analysis with 900+ label combinations
  • Comprehensive dependency parsing with 40 label types
  • High-accuracy sentence segmentation
  • Token classification with extensive feature support

Frequently Asked Questions

Q: What makes this model unique?

The model's exceptional accuracy across multiple tasks, combined with its comprehensive Russian language support and extensive vocabulary (500,002 vectors), makes it particularly valuable for production deployments. It's optimized for CPU usage, making it accessible for various deployment scenarios.

Q: What are the recommended use cases?

The model is ideal for advanced Russian text analysis tasks including: Named Entity Recognition, syntactic parsing, morphological analysis, and sentence segmentation. It's particularly suited for applications requiring high accuracy in Russian language processing, such as content analysis, information extraction, and text classification.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.