japanese-large-lm-3.6b
Property | Value |
---|---|
Parameter Count | 3.6B |
Architecture | GPTNeoX |
License | Apache 2.0 |
Hidden Dimension | 3072 |
Attention Heads | 32 |
Layers | 30 |
What is japanese-large-lm-3.6b?
japanese-large-lm-3.6b is a powerful Japanese language model developed by LINE Corporation, designed for advanced text generation tasks. This model represents a significant achievement in Japanese natural language processing, trained on approximately 650GB of diverse text data including C4, CC-100, and Oscar datasets.
Implementation Details
The model implements a GPTNeoX architecture with RoPE (Rotary Position Embedding) positional encoding, featuring 30 layers and 32 attention heads. It uses a sentencepiece tokenizer with a unigram language model and byte-fallback, notably without pre-tokenization using Japanese tokenizers, allowing direct input of raw sentences.
- Vocabulary size of 51,200 tokens
- 3072 hidden dimensions
- Achieves 7.50 perplexity on internal validation sets
- Supports FP16 precision for efficient inference
Core Capabilities
- High-quality Japanese text generation
- Efficient processing of raw Japanese text without pre-tokenization
- Flexible integration with the Transformers library
- Supports both CPU and GPU inference
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized focus on Japanese language processing, substantial parameter count (3.6B), and efficient architecture using RoPE positional encoding. The direct raw text processing capability without Japanese pre-tokenization makes it particularly user-friendly.
Q: What are the recommended use cases?
The model is well-suited for Japanese text generation tasks, including content creation, text completion, and general language modeling applications. Its architecture and training make it particularly effective for tasks requiring understanding of Japanese language nuances and context.