japanese-gpt2-medium

Maintained By
rinna

japanese-gpt2-medium

PropertyValue
Parameter Count361M
LicenseMIT
Research PaperView Paper
Training DataJapanese CC-100 and Wikipedia
Architecture24-layer, 1024-hidden-size transformer

What is japanese-gpt2-medium?

Japanese-gpt2-medium is a sophisticated language model developed by rinna Co., Ltd., specifically designed for Japanese text generation. This medium-sized model represents a significant advancement in Japanese natural language processing, featuring 361 million parameters and trained on extensive Japanese text corpora.

Implementation Details

The model utilizes a transformer-based architecture with 24 layers and 1024 hidden dimensions. Training was conducted on 8 V100 GPUs for approximately 30 days, achieving an impressive perplexity score of around 18 on the validation set. The model employs a sentencepiece-based tokenizer trained specifically on Japanese Wikipedia.

  • Transformer-based architecture with 24 layers
  • 1024-dimensional hidden states
  • Trained on Japanese CC-100 and Wikipedia datasets
  • Optimized for Japanese language understanding and generation

Core Capabilities

  • Advanced Japanese text generation
  • Language modeling with low perplexity
  • Efficient tokenization using sentencepiece
  • Compatible with Hugging Face's transformers library

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specific optimization for Japanese language processing, combining both CC-100 and Wikipedia training data with a custom sentencepiece tokenizer, making it particularly effective for Japanese text generation tasks.

Q: What are the recommended use cases?

The model is well-suited for Japanese text generation tasks, language modeling, and general Japanese NLP applications. It's particularly useful for researchers and developers working on Japanese language AI applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.