RuModernBERT-small
Property | Value |
---|---|
Parameter Count | 35M |
Hidden Dimensions | 384 |
Number of Layers | 12 |
Vocabulary Size | 50,368 |
Context Length | 8,192 tokens |
Model Type | Masked Language Model |
Author | deepvk |
Model URL | https://huggingface.co/deepvk/RuModernBERT-small |
What is RuModernBERT-small?
RuModernBERT-small is a compact yet powerful Russian language model that represents a significant advancement in Russian NLP. This 35M parameter model was pre-trained on approximately 2 trillion tokens of Russian, English, and code data, making it especially versatile for multilingual applications. The model supports an impressive context length of up to 8,192 tokens, enabling it to process longer text sequences effectively.
Implementation Details
The model's training process was conducted in three distinct stages: massive pre-training, context extension, and cooldown. Each stage utilized carefully curated data sources, including FineWeb, CulturaX-Ru-Edu, Wikipedia, ArXiv, books, code, StackExchange, and social media content. The initial training used a 1,024 token context length, later extended to 8,192 tokens in subsequent stages.
- First stage: 1.3T tokens training with 1,024 context length
- Second stage: 250B tokens with extended 8,192 context length
- Final stage: 50B tokens for model refinement
- Custom tokenizer trained on Russian and English FineWeb data
Core Capabilities
- Strong performance on Russian Super Glue (RSG) benchmark with 0.683 average score
- Efficient handling of both Russian and English text
- Extended context window of 8,192 tokens
- Excellent performance on various NLP tasks including sentiment analysis and toxicity detection
- Flash Attention 2 support for optimized inference
Frequently Asked Questions
Q: What makes this model unique?
RuModernBERT-small stands out for its efficient architecture that achieves impressive performance with only 35M parameters, while supporting an extended context length of 8,192 tokens. It demonstrates competitive performance against larger models on various benchmarks while requiring significantly fewer computational resources.
Q: What are the recommended use cases?
The model excels in various Russian NLP tasks including text classification, sentiment analysis, and masked language modeling. It's particularly suitable for applications requiring efficient processing of both Russian and English text, making it ideal for multilingual applications with resource constraints.