japanese-gpt-neox-3.6b-instruction-sft

japanese-gpt-neox-3.6b-instruction-sft

rinna

A 3.6B parameter Japanese language model fine-tuned for instruction following, based on GPT-NeoX architecture with specialized tokenization and conversation capabilities.

PropertyValue
Parameter Count3.76B
LicenseMIT
PaperResearch Paper
AuthorsTianyu Zhao and Kei Sawada
Architecture36-layer, 2816-hidden-size transformer

What is japanese-gpt-neox-3.6b-instruction-sft?

This is a sophisticated Japanese language model based on the GPT-NeoX architecture, specifically fine-tuned for instruction-following and conversational tasks. The model represents a significant advancement in Japanese natural language processing, incorporating 3.6 billion parameters and trained on high-quality datasets including Anthropic HH RLHF data and Stanford Human Preferences Dataset.

Implementation Details

The model utilizes a specialized tokenization system based on SentencePiece with a 32,000-token vocabulary. Notable technical features include byte fallback capability for handling unknown tokens and specific configurations for whitespace handling.

  • Custom input/output format for conversation handling between "ユーザー" and "システム"
  • Special newline handling using "NL" token
  • Optimized tokenization without automatic whitespace prepending
  • Support for multiple tensor types including F32, FP16, and U8

Core Capabilities

  • Instruction-following conversation in Japanese
  • Advanced text generation with temperature control
  • Efficient handling of unknown characters through byte fallback
  • Preservation of whitespace formatting

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized Japanese language capabilities and careful fine-tuning on instruction-following tasks. The combination of its size (3.6B parameters) and sophisticated tokenization makes it particularly effective for Japanese text generation and conversation.

Q: What are the recommended use cases?

The model is ideal for Japanese conversational AI applications, instruction-following tasks, and general text generation. It's particularly well-suited for applications requiring natural Japanese language interaction and understanding of context.

Socials
Integrations
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026