DeBERTa V2 Base Japanese
Property | Value |
---|---|
Parameter Count | 137M |
License | CC-BY-SA-4.0 |
Training Data | Wikipedia, CC-100, OSCAR |
MLM Accuracy | 0.779 |
What is deberta-v2-base-japanese?
The deberta-v2-base-japanese is a sophisticated Japanese language model based on the DeBERTa V2 architecture, specifically trained on a massive corpus of Japanese text. Developed by ku-nlp, this model represents a significant advancement in Japanese natural language processing, incorporating 137 million parameters and achieving impressive performance across various NLP tasks.
Implementation Details
The model was trained on a combined 171GB dataset comprising Japanese Wikipedia, CC-100, and OSCAR corpora. Training utilized 8 NVIDIA A100-SXM4-40GB GPUs over three weeks, implementing advanced tokenization through Juman++ and sentencepiece with 32,000 tokens.
- Training batch size: 2,112
- Learning rate: 2e-4
- Training steps: 500,000
- Sequence length: 512
Core Capabilities
- Masked Language Modeling with 0.779 accuracy
- Strong performance on JGLUE benchmark tasks
- Specialized for Japanese text processing
- Support for fine-tuning on downstream tasks
Frequently Asked Questions
Q: What makes this model unique?
This model combines DeBERTa V2's advanced architecture with comprehensive Japanese language training, utilizing a unique word segmentation approach through Juman++ and achieving competitive performance on various Japanese NLP benchmarks.
Q: What are the recommended use cases?
The model excels in masked language modeling tasks and can be fine-tuned for various downstream applications including sentiment analysis (MARC-ja), textual similarity (JSTS), natural language inference (JNLI), and question answering (JSQuAD, JComQA).