japanese-gpt2-small

Maintained By
rinna

japanese-gpt2-small

PropertyValue
Parameter Count123M
LicenseMIT
Research PaperView Paper
Training DataJapanese CC-100 and Wikipedia
Architecture12-layer, 768-hidden-size transformer

What is japanese-gpt2-small?

japanese-gpt2-small is a compact Japanese language model developed by rinna Co., Ltd. It's a GPT-2 variant specifically trained for Japanese text generation, utilizing a transformer-based architecture with 123 million parameters. The model achieves approximately 21 perplexity on its validation set, demonstrating strong performance in Japanese language understanding and generation.

Implementation Details

The model employs a sentencepiece-based tokenizer trained on Japanese Wikipedia. It was trained for approximately 15 days on 8 V100 GPUs, using Japanese CC-100 and Wikipedia data. The implementation is compatible with the Hugging Face Transformers library, making it easily accessible for developers.

  • 12-layer transformer architecture with 768 hidden dimensions
  • Sentencepiece tokenization optimized for Japanese text
  • Trained on high-quality Japanese corpus data
  • Compatible with PyTorch and TensorFlow frameworks

Core Capabilities

  • Japanese text generation and completion
  • Language modeling with 21 perplexity performance
  • Efficient processing of Japanese characters and grammar
  • Suitable for various NLP tasks in Japanese

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for being specifically optimized for Japanese language processing, using a carefully curated training dataset and specialized tokenization approach. Its relatively small size (123M parameters) makes it practical for deployment while maintaining good performance.

Q: What are the recommended use cases?

The model is well-suited for Japanese text generation tasks, including creative writing assistance, content completion, and general language modeling applications. It's particularly valuable for developers looking for a balance between model size and performance in Japanese NLP applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.