deberta-v2-base-japanese-char-wwm

deberta-v2-base-japanese-char-wwm

ku-nlp

Japanese DeBERTa V2 base model (122M params) pre-trained on Wikipedia, CC-100, and OSCAR. Features character-level tokenization and whole word masking for advanced NLP tasks.

PropertyValue
Parameter Count122M
LicenseCC-BY-SA-4.0
Training DataWikipedia, CC-100, OSCAR
Token TypeCharacter-level
Authorku-nlp

What is deberta-v2-base-japanese-char-wwm?

This is a specialized Japanese language model based on the DeBERTa V2 architecture, designed with character-level tokenization and whole word masking (WWM). Trained on a massive dataset of 171GB of Japanese text, it represents a significant advancement in Japanese natural language processing capabilities.

Implementation Details

The model was trained using 8 NVIDIA A100-SXM4-40GB GPUs over 20 days, implementing a sophisticated training procedure with sentencepiece tokenization (22,012 tokens). The training utilized a linear learning rate schedule with warmup, reaching completion at 320,000 steps.

  • Learning rate: 2e-4 with Adam optimizer
  • Batch size: 2,208 (total)
  • Sequence length: 512 tokens
  • Training corpus: 171GB combined data

Core Capabilities

  • Masked Language Modeling
  • Character-level tokenization
  • Whole Word Masking
  • Fine-tuning support for downstream tasks

Frequently Asked Questions

Q: What makes this model unique?

This model combines character-level tokenization with whole word masking, making it particularly effective for Japanese text processing. Its training on a diverse dataset including Wikipedia, CC-100, and OSCAR provides robust language understanding capabilities.

Q: What are the recommended use cases?

The model excels in masked language modeling tasks and can be fine-tuned for various downstream applications like text classification, named entity recognition, and question answering in Japanese language contexts.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026