japanese-large-lm-3.6b

Property	Value
Parameter Count	3.6B
Architecture	GPTNeoX
License	Apache 2.0
Hidden Dimension	3072
Attention Heads	32
Layers	30

What is japanese-large-lm-3.6b?

japanese-large-lm-3.6b is a powerful Japanese language model developed by LINE Corporation, designed for advanced text generation tasks. This model represents a significant achievement in Japanese natural language processing, trained on approximately 650GB of diverse text data including C4, CC-100, and Oscar datasets.

Implementation Details

The model implements a GPTNeoX architecture with RoPE (Rotary Position Embedding) positional encoding, featuring 30 layers and 32 attention heads. It uses a sentencepiece tokenizer with a unigram language model and byte-fallback, notably without pre-tokenization using Japanese tokenizers, allowing direct input of raw sentences.

Vocabulary size of 51,200 tokens
3072 hidden dimensions
Achieves 7.50 perplexity on internal validation sets
Supports FP16 precision for efficient inference

Core Capabilities

High-quality Japanese text generation
Efficient processing of raw Japanese text without pre-tokenization
Flexible integration with the Transformers library
Supports both CPU and GPU inference

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized focus on Japanese language processing, substantial parameter count (3.6B), and efficient architecture using RoPE positional encoding. The direct raw text processing capability without Japanese pre-tokenization makes it particularly user-friendly.

Q: What are the recommended use cases?

The model is well-suited for Japanese text generation tasks, including content creation, text completion, and general language modeling applications. Its architecture and training make it particularly effective for tasks requiring understanding of Japanese language nuances and context.