deberta-v2-base-japanese-char-wwm

Maintained By
ku-nlp

DeBERTa V2 Base Japanese Character WWM

PropertyValue
Parameter Count122M
LicenseCC-BY-SA-4.0
Training DataWikipedia, CC-100, OSCAR
Token TypeCharacter-level
Authorku-nlp

What is deberta-v2-base-japanese-char-wwm?

This is a specialized Japanese language model based on the DeBERTa V2 architecture, designed with character-level tokenization and whole word masking (WWM). Trained on a massive dataset of 171GB of Japanese text, it represents a significant advancement in Japanese natural language processing capabilities.

Implementation Details

The model was trained using 8 NVIDIA A100-SXM4-40GB GPUs over 20 days, implementing a sophisticated training procedure with sentencepiece tokenization (22,012 tokens). The training utilized a linear learning rate schedule with warmup, reaching completion at 320,000 steps.

  • Learning rate: 2e-4 with Adam optimizer
  • Batch size: 2,208 (total)
  • Sequence length: 512 tokens
  • Training corpus: 171GB combined data

Core Capabilities

  • Masked Language Modeling
  • Character-level tokenization
  • Whole Word Masking
  • Fine-tuning support for downstream tasks

Frequently Asked Questions

Q: What makes this model unique?

This model combines character-level tokenization with whole word masking, making it particularly effective for Japanese text processing. Its training on a diverse dataset including Wikipedia, CC-100, and OSCAR provides robust language understanding capabilities.

Q: What are the recommended use cases?

The model excels in masked language modeling tasks and can be fine-tuned for various downstream applications like text classification, named entity recognition, and question answering in Japanese language contexts.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.