RuModernBERT-base

Property	Value
Parameter Count	150M
Hidden Dimension	768
Number of Layers	22
Vocabulary Size	50,368
Context Length	8,192 tokens
Model Type	Masked Language Model
Author	deepvk

What is RuModernBERT-base?

RuModernBERT-base is a sophisticated Russian language model that represents a significant advancement in multilingual AI capabilities. This 150M parameter model was pre-trained on approximately 2 trillion tokens of Russian, English, and code data, utilizing a unique three-stage training approach that encompasses massive pre-training, context extension, and cooldown phases.

Implementation Details

The model features a robust architecture with 22 layers, 768 hidden dimensions, and a vocabulary size of 50,368 tokens. Its notable 8,192-token context length enables processing of much longer sequences than traditional BERT models. The training process was carefully structured across three stages, with the first stage using a 1,024 token context length before extending to 8,192 tokens in later stages.

Trained on diverse data sources including FineWeb, CulturaX-Ru-Edu, Wiki, ArXiv, books, code, and social media content
Implements flash attention 2 for improved performance
Achieves state-of-the-art performance on Russian Super Glue (RSG) with a score of 0.737

Core Capabilities

Advanced masked language modeling for Russian text
Extended context processing up to 8,192 tokens
Strong performance on multiple downstream tasks including RCB, MuSeRC, and RUSSE
Effective handling of both Russian and English content

Frequently Asked Questions

Q: What makes this model unique?

RuModernBERT-base stands out for its extensive pre-training on 2 trillion tokens, sophisticated three-stage training approach, and impressive 8,192-token context length. It achieves superior performance on Russian language benchmarks while maintaining strong capabilities in English.

Q: What are the recommended use cases?

The model excels in masked language modeling tasks and can be effectively used for text understanding, classification, and analysis in Russian and English contexts. It's particularly well-suited for applications requiring long context understanding and multilingual capabilities.