japanese-gpt-neox-3.6b-instruction-sft-v2
Property | Value |
---|---|
Parameter Count | 3.6 Billion |
Model Type | Instruction-tuned Language Model |
Architecture | 36-layer, 2816-hidden-size transformer |
License | MIT |
Authors | Tianyu Zhao and Kei Sawada |
What is japanese-gpt-neox-3.6b-instruction-sft-v2?
This is an advanced Japanese language model based on GPT-NeoX architecture, specifically fine-tuned for instruction-following and conversational tasks. It represents an improvement over its predecessor, utilizing a different data split for training and showing better performance in ChatGPT-based automated evaluations.
Implementation Details
The model employs a sophisticated tokenization system using SentencePiece with a 32,000-token vocabulary. It features specialized handling of Japanese text and unique conversation formatting using a system-user dialogue structure.
- Custom tokenizer with byte fallback feature to handle unknown characters
- Specialized conversation format using ユーザー and システム roles
- Fine-tuned on translated datasets including Anthropic HH RLHF, FLAN, and Stanford Human Preferences
- Supports advanced generation parameters including temperature and repetition penalty
Core Capabilities
- Natural Japanese language understanding and generation
- Instruction-following in conversational contexts
- Handles complex dialogue interactions
- Preserves whitespace and special characters accurately
- 55% win rate against previous version in automated evaluations
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized Japanese language capabilities and improved instruction-following abilities, achieved through careful fine-tuning and a unique tokenization approach that handles Japanese text effectively.
Q: What are the recommended use cases?
The model is particularly well-suited for Japanese conversational AI applications, chatbots, and instruction-following tasks where natural Japanese language interaction is required. It's designed to handle both formal and informal conversation styles.