japanese-gpt-neox-3.6b-instruction-sft

japanese-gpt-neox-3.6b-instruction-sft

rinna

A 3.6B parameter Japanese language model fine-tuned for instruction following, based on GPT-NeoX architecture with specialized tokenization and conversation capabilities.

PropertyValue
Parameter Count3.76B
LicenseMIT
PaperResearch Paper
AuthorsTianyu Zhao and Kei Sawada
Architecture36-layer, 2816-hidden-size transformer

What is japanese-gpt-neox-3.6b-instruction-sft?

This is a sophisticated Japanese language model based on the GPT-NeoX architecture, specifically fine-tuned for instruction-following and conversational tasks. The model represents a significant advancement in Japanese natural language processing, incorporating 3.6 billion parameters and trained on high-quality datasets including Anthropic HH RLHF data and Stanford Human Preferences Dataset.

Implementation Details

The model utilizes a specialized tokenization system based on SentencePiece with a 32,000-token vocabulary. Notable technical features include byte fallback capability for handling unknown tokens and specific configurations for whitespace handling.

  • Custom input/output format for conversation handling between "ユーザー" and "システム"
  • Special newline handling using "NL" token
  • Optimized tokenization without automatic whitespace prepending
  • Support for multiple tensor types including F32, FP16, and U8

Core Capabilities

  • Instruction-following conversation in Japanese
  • Advanced text generation with temperature control
  • Efficient handling of unknown characters through byte fallback
  • Preservation of whitespace formatting

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized Japanese language capabilities and careful fine-tuning on instruction-following tasks. The combination of its size (3.6B parameters) and sophisticated tokenization makes it particularly effective for Japanese text generation and conversation.

Q: What are the recommended use cases?

The model is ideal for Japanese conversational AI applications, instruction-following tasks, and general text generation. It's particularly well-suited for applications requiring natural Japanese language interaction and understanding of context.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026