Sarashina2.2-1B
Property | Value |
---|---|
Parameter Count | 1 Billion |
Training Data | 10 Trillion Tokens |
License | MIT |
Author | SB Intuitions |
Model URL | huggingface.co/sbintuitions/sarashina2.2-1b |
What is sarashina2.2-1b?
Sarashina2.2-1B is a large language model developed by SB Intuitions, designed to excel in both Japanese and English language processing. The model features approximately 1 billion parameters and underwent a sophisticated three-phase training process on a diverse dataset of 10 trillion tokens.
Implementation Details
The model's training process consisted of three distinct phases: initial training on multilingual web corpora (Japanese, English, and code), synthetic data training for mathematical and coding capabilities, and final refinement with specialized application tasks. The model demonstrates impressive performance metrics, achieving 47.2% on NIILC, 38.2% on JMMLU, 39.6% on MGSM-ja, and 20.7% on JHumanEval benchmarks.
- Three-phase training methodology
- Optimized for Japanese and English language processing
- Enhanced mathematical and coding capabilities
- Competitive performance on multiple Japanese benchmarks
Core Capabilities
- Bilingual text generation in Japanese and English
- Mathematical problem-solving capabilities
- Code generation and understanding
- Strong performance on Japanese QA tasks
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive three-phase training approach and optimization for both Japanese and English languages, combined with its strong performance on mathematical and coding tasks, sets it apart from other models in its parameter range.
Q: What are the recommended use cases?
The model is well-suited for Japanese-English language processing tasks, mathematical problem-solving, and coding applications. However, users should note that this is a pre-trained model without instruction tuning, so it may require additional fine-tuning for specific applications.