japanese-gpt2-small
Property | Value |
---|---|
Parameter Count | 123M |
License | MIT |
Research Paper | View Paper |
Training Data | Japanese CC-100 and Wikipedia |
Architecture | 12-layer, 768-hidden-size transformer |
What is japanese-gpt2-small?
japanese-gpt2-small is a compact Japanese language model developed by rinna Co., Ltd. It's a GPT-2 variant specifically trained for Japanese text generation, utilizing a transformer-based architecture with 123 million parameters. The model achieves approximately 21 perplexity on its validation set, demonstrating strong performance in Japanese language understanding and generation.
Implementation Details
The model employs a sentencepiece-based tokenizer trained on Japanese Wikipedia. It was trained for approximately 15 days on 8 V100 GPUs, using Japanese CC-100 and Wikipedia data. The implementation is compatible with the Hugging Face Transformers library, making it easily accessible for developers.
- 12-layer transformer architecture with 768 hidden dimensions
- Sentencepiece tokenization optimized for Japanese text
- Trained on high-quality Japanese corpus data
- Compatible with PyTorch and TensorFlow frameworks
Core Capabilities
- Japanese text generation and completion
- Language modeling with 21 perplexity performance
- Efficient processing of Japanese characters and grammar
- Suitable for various NLP tasks in Japanese
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for being specifically optimized for Japanese language processing, using a carefully curated training dataset and specialized tokenization approach. Its relatively small size (123M parameters) makes it practical for deployment while maintaining good performance.
Q: What are the recommended use cases?
The model is well-suited for Japanese text generation tasks, including creative writing assistance, content completion, and general language modeling applications. It's particularly valuable for developers looking for a balance between model size and performance in Japanese NLP applications.