RuModernBERT-base

Maintained By
deepvk

RuModernBERT-base

PropertyValue
Parameter Count150M
Hidden Dimension768
Number of Layers22
Vocabulary Size50,368
Context Length8,192 tokens
Model TypeMasked Language Model
Authordeepvk

What is RuModernBERT-base?

RuModernBERT-base is a sophisticated Russian language model that represents a significant advancement in multilingual AI capabilities. This 150M parameter model was pre-trained on approximately 2 trillion tokens of Russian, English, and code data, utilizing a unique three-stage training approach that encompasses massive pre-training, context extension, and cooldown phases.

Implementation Details

The model features a robust architecture with 22 layers, 768 hidden dimensions, and a vocabulary size of 50,368 tokens. Its notable 8,192-token context length enables processing of much longer sequences than traditional BERT models. The training process was carefully structured across three stages, with the first stage using a 1,024 token context length before extending to 8,192 tokens in later stages.

  • Trained on diverse data sources including FineWeb, CulturaX-Ru-Edu, Wiki, ArXiv, books, code, and social media content
  • Implements flash attention 2 for improved performance
  • Achieves state-of-the-art performance on Russian Super Glue (RSG) with a score of 0.737

Core Capabilities

  • Advanced masked language modeling for Russian text
  • Extended context processing up to 8,192 tokens
  • Strong performance on multiple downstream tasks including RCB, MuSeRC, and RUSSE
  • Effective handling of both Russian and English content

Frequently Asked Questions

Q: What makes this model unique?

RuModernBERT-base stands out for its extensive pre-training on 2 trillion tokens, sophisticated three-stage training approach, and impressive 8,192-token context length. It achieves superior performance on Russian language benchmarks while maintaining strong capabilities in English.

Q: What are the recommended use cases?

The model excels in masked language modeling tasks and can be effectively used for text understanding, classification, and analysis in Russian and English contexts. It's particularly well-suited for applications requiring long context understanding and multilingual capabilities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.