Sarashina2.2-1B

Property	Value
Parameter Count	1 Billion
Training Data	10 Trillion Tokens
License	MIT
Author	SB Intuitions
Model URL	huggingface.co/sbintuitions/sarashina2.2-1b

What is sarashina2.2-1b?

Sarashina2.2-1B is a large language model developed by SB Intuitions, designed to excel in both Japanese and English language processing. The model features approximately 1 billion parameters and underwent a sophisticated three-phase training process on a diverse dataset of 10 trillion tokens.

Implementation Details

The model's training process consisted of three distinct phases: initial training on multilingual web corpora (Japanese, English, and code), synthetic data training for mathematical and coding capabilities, and final refinement with specialized application tasks. The model demonstrates impressive performance metrics, achieving 47.2% on NIILC, 38.2% on JMMLU, 39.6% on MGSM-ja, and 20.7% on JHumanEval benchmarks.

Three-phase training methodology
Optimized for Japanese and English language processing
Enhanced mathematical and coding capabilities
Competitive performance on multiple Japanese benchmarks

Core Capabilities

Bilingual text generation in Japanese and English
Mathematical problem-solving capabilities
Code generation and understanding
Strong performance on Japanese QA tasks

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive three-phase training approach and optimization for both Japanese and English languages, combined with its strong performance on mathematical and coding tasks, sets it apart from other models in its parameter range.

Q: What are the recommended use cases?

The model is well-suited for Japanese-English language processing tasks, mathematical problem-solving, and coding applications. However, users should note that this is a pre-trained model without instruction tuning, so it may require additional fine-tuning for specific applications.

sarashina2.2-1b