ModernBERT-base

Property	Value
Parameter Count	149 million
Context Length	8,192 tokens
Training Data	2 trillion tokens
License	Apache 2.0
Paper	arXiv:2412.13663

What is ModernBERT-base?

ModernBERT-base is a state-of-the-art bidirectional encoder model that modernizes the traditional BERT architecture with cutting-edge improvements. It represents a significant advancement in transformer-based models, trained on an extensive dataset of 2 trillion tokens spanning both English text and code.

Implementation Details

The model implements several modern architectural improvements including Rotary Positional Embeddings (RoPE), Local-Global Alternating Attention, and Flash Attention support. It's built with a pre-norm transformer architecture featuring GeGLU activations and was trained using StableAdamW optimizer with trapezoidal learning rate scheduling.

22 layers of transformer architecture
Native support for sequences up to 8,192 tokens
Efficient unpadding and Flash Attention optimization
Pre-trained on both text and code data

Core Capabilities

Superior performance on GLUE benchmark tasks
Excellent retrieval capabilities on BEIR and MLDR datasets
State-of-the-art results in code retrieval tasks
Efficient processing of long-context inputs
Strong performance in both single-vector and multi-vector retrieval settings

Frequently Asked Questions

Q: What makes this model unique?

ModernBERT-base stands out through its combination of modern architectural improvements, extensive training on diverse data, and efficient handling of long sequences. It achieves superior performance while maintaining practical inference speeds, particularly when used with Flash Attention 2.

Q: What are the recommended use cases?

The model excels in tasks requiring long document processing, including document retrieval, classification, and semantic search. It's particularly effective for code-related tasks and hybrid (text + code) semantic search applications. The model can be fine-tuned for specific downstream tasks following standard BERT fine-tuning approaches.

ModernBERT-base

ModernBERT-base

What is ModernBERT-base?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models