Sarashina2.2-0.5B
Property | Value |
---|---|
Parameter Count | 0.5 Billion |
Training Tokens | 10 Trillion |
License | MIT |
Author | SB Intuitions |
Model URL | huggingface.co/sbintuitions/sarashina2.2-0.5b |
What is sarashina2.2-0.5b?
Sarashina2.2-0.5B is a sophisticated language model developed by SB Intuitions, featuring approximately 500 million parameters. The model underwent a three-phase training process, including pretraining on 10 trillion tokens of Japanese, English, and code data, followed by synthetic data training for mathematical and coding tasks, and fine-tuning for specific applications.
Implementation Details
The model employs a unique training methodology focusing on multilingual capabilities and specialized task performance. It demonstrates impressive benchmark scores, particularly in Japanese language tasks, achieving 33.9% on NIILC, 28.8% on JMMLU, 21.6% on MGSM-ja, and 15.2% on JHumanEval.
- Multi-phase training architecture
- Specialized synthetic data enhancement
- Optimized for Japanese and English text generation
- Built-in support for coding tasks
Core Capabilities
- Multilingual text generation in Japanese and English
- Mathematical problem solving
- Code generation and analysis
- Natural language understanding in Japanese context
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive three-phase training process and optimization for Japanese-English bilingual capabilities, combined with its relatively compact size of 0.5B parameters, makes it particularly efficient for specialized applications.
Q: What are the recommended use cases?
The model is best suited for Japanese-English text generation, mathematical problem solving, and coding tasks. However, users should note that this is a pre-trained model without instruction tuning, and may require additional fine-tuning for specific applications.