RuModernBERT-base
Property | Value |
---|---|
Parameter Count | 150M |
Hidden Dimension | 768 |
Number of Layers | 22 |
Vocabulary Size | 50,368 |
Context Length | 8,192 tokens |
Model Type | Masked Language Model |
Author | deepvk |
What is RuModernBERT-base?
RuModernBERT-base is a sophisticated Russian language model that represents a significant advancement in multilingual AI capabilities. This 150M parameter model was pre-trained on approximately 2 trillion tokens of Russian, English, and code data, utilizing a unique three-stage training approach that encompasses massive pre-training, context extension, and cooldown phases.
Implementation Details
The model features a robust architecture with 22 layers, 768 hidden dimensions, and a vocabulary size of 50,368 tokens. Its notable 8,192-token context length enables processing of much longer sequences than traditional BERT models. The training process was carefully structured across three stages, with the first stage using a 1,024 token context length before extending to 8,192 tokens in later stages.
- Trained on diverse data sources including FineWeb, CulturaX-Ru-Edu, Wiki, ArXiv, books, code, and social media content
- Implements flash attention 2 for improved performance
- Achieves state-of-the-art performance on Russian Super Glue (RSG) with a score of 0.737
Core Capabilities
- Advanced masked language modeling for Russian text
- Extended context processing up to 8,192 tokens
- Strong performance on multiple downstream tasks including RCB, MuSeRC, and RUSSE
- Effective handling of both Russian and English content
Frequently Asked Questions
Q: What makes this model unique?
RuModernBERT-base stands out for its extensive pre-training on 2 trillion tokens, sophisticated three-stage training approach, and impressive 8,192-token context length. It achieves superior performance on Russian language benchmarks while maintaining strong capabilities in English.
Q: What are the recommended use cases?
The model excels in masked language modeling tasks and can be effectively used for text understanding, classification, and analysis in Russian and English contexts. It's particularly well-suited for applications requiring long context understanding and multilingual capabilities.