modernbert-ja-30m

Maintained By
sbintuitions

ModernBERT-Ja-30M

PropertyValue
Total Parameters37M
Architecture Parameters10M (without embeddings)
Dimension256
LicenseMIT
Model TypeMasked Language Model
Context Length8,192 tokens

What is modernbert-ja-30m?

ModernBERT-Ja-30M is an innovative Japanese language model that combines local and global attention mechanisms to process long sequences efficiently. Developed by SB Intuitions, it's trained on a massive corpus of 4.39T tokens of Japanese and English text, featuring a vocabulary size of 102,400. The model represents a significant advancement in Japanese language processing, incorporating modern architectural improvements like RoPE (Rotary Position Embedding).

Implementation Details

The model employs a sophisticated three-stage training process: initial pre-training on 3.51T tokens, followed by two context extension phases using high-quality data. The architecture features 10 layers with an intermediate dimension of 1024, utilizing a combination of global and local attention patterns (1 layer + 2 layers).

  • Sliding window attention with 128-token context size
  • Global RoPE theta: 160,000
  • Local RoPE theta: 10,000
  • GELU activation function
  • Unigram language model tokenizer with byte fallback

Core Capabilities

  • Masked language modeling with strong performance on short sequences
  • Efficient processing of sequences up to 8,192 tokens
  • Achieves 85.67% average score across 12 evaluation datasets
  • Excellent performance on various tasks including JGLUE benchmarks
  • Direct raw text processing without pre-tokenization

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its hybrid attention mechanism combining local and global attention, allowing it to handle long sequences efficiently while maintaining strong performance on shorter texts. It's also notable for its extensive training on both Japanese and English data, totaling 4.39T tokens.

Q: What are the recommended use cases?

The model is primarily designed for masked language modeling and fine-tuning on downstream tasks. It's particularly effective for tasks requiring understanding of Japanese text, though it's not recommended for text generation tasks or token classification tasks like named entity recognition.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.