japanese-gpt2-medium
Property | Value |
---|---|
Parameter Count | 361M |
License | MIT |
Research Paper | View Paper |
Training Data | Japanese CC-100 and Wikipedia |
Architecture | 24-layer, 1024-hidden-size transformer |
What is japanese-gpt2-medium?
Japanese-gpt2-medium is a sophisticated language model developed by rinna Co., Ltd., specifically designed for Japanese text generation. This medium-sized model represents a significant advancement in Japanese natural language processing, featuring 361 million parameters and trained on extensive Japanese text corpora.
Implementation Details
The model utilizes a transformer-based architecture with 24 layers and 1024 hidden dimensions. Training was conducted on 8 V100 GPUs for approximately 30 days, achieving an impressive perplexity score of around 18 on the validation set. The model employs a sentencepiece-based tokenizer trained specifically on Japanese Wikipedia.
- Transformer-based architecture with 24 layers
- 1024-dimensional hidden states
- Trained on Japanese CC-100 and Wikipedia datasets
- Optimized for Japanese language understanding and generation
Core Capabilities
- Advanced Japanese text generation
- Language modeling with low perplexity
- Efficient tokenization using sentencepiece
- Compatible with Hugging Face's transformers library
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specific optimization for Japanese language processing, combining both CC-100 and Wikipedia training data with a custom sentencepiece tokenizer, making it particularly effective for Japanese text generation tasks.
Q: What are the recommended use cases?
The model is well-suited for Japanese text generation tasks, language modeling, and general Japanese NLP applications. It's particularly useful for researchers and developers working on Japanese language AI applications.