japanese-gpt2-small

Property	Value
Parameter Count	123M
License	MIT
Research Paper	View Paper
Training Data	Japanese CC-100 and Wikipedia
Architecture	12-layer, 768-hidden-size transformer

What is japanese-gpt2-small?

japanese-gpt2-small is a compact Japanese language model developed by rinna Co., Ltd. It's a GPT-2 variant specifically trained for Japanese text generation, utilizing a transformer-based architecture with 123 million parameters. The model achieves approximately 21 perplexity on its validation set, demonstrating strong performance in Japanese language understanding and generation.

Implementation Details

The model employs a sentencepiece-based tokenizer trained on Japanese Wikipedia. It was trained for approximately 15 days on 8 V100 GPUs, using Japanese CC-100 and Wikipedia data. The implementation is compatible with the Hugging Face Transformers library, making it easily accessible for developers.

12-layer transformer architecture with 768 hidden dimensions
Sentencepiece tokenization optimized for Japanese text
Trained on high-quality Japanese corpus data
Compatible with PyTorch and TensorFlow frameworks

Core Capabilities

Japanese text generation and completion
Language modeling with 21 perplexity performance
Efficient processing of Japanese characters and grammar
Suitable for various NLP tasks in Japanese

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for being specifically optimized for Japanese language processing, using a carefully curated training dataset and specialized tokenization approach. Its relatively small size (123M parameters) makes it practical for deployment while maintaining good performance.

Q: What are the recommended use cases?

The model is well-suited for Japanese text generation tasks, including creative writing assistance, content completion, and general language modeling applications. It's particularly valuable for developers looking for a balance between model size and performance in Japanese NLP applications.