NeoBERT

Maintained By
chandar-lab

NeoBERT

PropertyValue
Parameter Count250M
Context Length4,096 tokens
Architecture28 layers × 768 width
Training DataRefinedWeb (2.8 TB)
LicenseMIT

What is NeoBERT?

NeoBERT represents a significant advancement in transformer-based language models, designed as a next-generation encoder for English text representation. Pre-trained from scratch on the massive RefinedWeb dataset, it combines modern architectural improvements with optimized training methodologies while maintaining a relatively compact 250M parameter footprint.

Implementation Details

The model incorporates several cutting-edge technical features that contribute to its exceptional performance:

  • SwiGLU activation function for enhanced processing capabilities
  • RoPE (Rotary Positional Embeddings) for better position understanding
  • Pre-RMSNorm for stable training
  • FlashAttention for computational efficiency
  • 20% MLM masking rate during pre-training
  • Trained on 2.1T tokens using AdamW optimizer with Cosine Decay

Core Capabilities

  • State-of-the-art performance on the MTEB benchmark
  • Extended context length of 4,096 tokens
  • Plug-and-play replacement for existing base models
  • Efficient processing with optimized depth-to-width ratio
  • Superior performance compared to larger models like BERT large and RoBERTa large

Frequently Asked Questions

Q: What makes this model unique?

NeoBERT stands out through its optimal balance of efficiency and performance, achieving state-of-the-art results despite its modest 250M parameter count. It incorporates modern architectural improvements while maintaining compatibility with existing BERT-based workflows.

Q: What are the recommended use cases?

The model is ideal for general-purpose text representation tasks, particularly when efficiency is crucial. It's especially suitable for applications requiring longer context understanding (up to 4,096 tokens) and can serve as a drop-in replacement for existing BERT-based models.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.