RuModernBERT-small

Property	Value
Parameter Count	35M
Hidden Dimensions	384
Number of Layers	12
Vocabulary Size	50,368
Context Length	8,192 tokens
Model Type	Masked Language Model
Author	deepvk
Model URL	https://huggingface.co/deepvk/RuModernBERT-small

What is RuModernBERT-small?

RuModernBERT-small is a compact yet powerful Russian language model that represents a significant advancement in Russian NLP. This 35M parameter model was pre-trained on approximately 2 trillion tokens of Russian, English, and code data, making it especially versatile for multilingual applications. The model supports an impressive context length of up to 8,192 tokens, enabling it to process longer text sequences effectively.

Implementation Details

The model's training process was conducted in three distinct stages: massive pre-training, context extension, and cooldown. Each stage utilized carefully curated data sources, including FineWeb, CulturaX-Ru-Edu, Wikipedia, ArXiv, books, code, StackExchange, and social media content. The initial training used a 1,024 token context length, later extended to 8,192 tokens in subsequent stages.

First stage: 1.3T tokens training with 1,024 context length
Second stage: 250B tokens with extended 8,192 context length
Final stage: 50B tokens for model refinement
Custom tokenizer trained on Russian and English FineWeb data

Core Capabilities

Strong performance on Russian Super Glue (RSG) benchmark with 0.683 average score
Efficient handling of both Russian and English text
Extended context window of 8,192 tokens
Excellent performance on various NLP tasks including sentiment analysis and toxicity detection
Flash Attention 2 support for optimized inference

Frequently Asked Questions

Q: What makes this model unique?

RuModernBERT-small stands out for its efficient architecture that achieves impressive performance with only 35M parameters, while supporting an extended context length of 8,192 tokens. It demonstrates competitive performance against larger models on various benchmarks while requiring significantly fewer computational resources.

Q: What are the recommended use cases?

The model excels in various Russian NLP tasks including text classification, sentiment analysis, and masked language modeling. It's particularly suitable for applications requiring efficient processing of both Russian and English text, making it ideal for multilingual applications with resource constraints.