japanese-gpt-neox-3.6b-instruction-sft

Maintained By
rinna

japanese-gpt-neox-3.6b-instruction-sft

PropertyValue
Parameter Count3.76B
LicenseMIT
PaperResearch Paper
AuthorsTianyu Zhao and Kei Sawada
Architecture36-layer, 2816-hidden-size transformer

What is japanese-gpt-neox-3.6b-instruction-sft?

This is a sophisticated Japanese language model based on the GPT-NeoX architecture, specifically fine-tuned for instruction-following and conversational tasks. The model represents a significant advancement in Japanese natural language processing, incorporating 3.6 billion parameters and trained on high-quality datasets including Anthropic HH RLHF data and Stanford Human Preferences Dataset.

Implementation Details

The model utilizes a specialized tokenization system based on SentencePiece with a 32,000-token vocabulary. Notable technical features include byte fallback capability for handling unknown tokens and specific configurations for whitespace handling.

  • Custom input/output format for conversation handling between "ユーザー" and "システム"
  • Special newline handling using "NL" token
  • Optimized tokenization without automatic whitespace prepending
  • Support for multiple tensor types including F32, FP16, and U8

Core Capabilities

  • Instruction-following conversation in Japanese
  • Advanced text generation with temperature control
  • Efficient handling of unknown characters through byte fallback
  • Preservation of whitespace formatting

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized Japanese language capabilities and careful fine-tuning on instruction-following tasks. The combination of its size (3.6B parameters) and sophisticated tokenization makes it particularly effective for Japanese text generation and conversation.

Q: What are the recommended use cases?

The model is ideal for Japanese conversational AI applications, instruction-following tasks, and general text generation. It's particularly well-suited for applications requiring natural Japanese language interaction and understanding of context.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.