Sarashina2-8x70B
Property | Value |
---|---|
Parameter Count | 465B |
Tensor Type | BF16 |
License | Sarashina Model NonCommercial License |
Research Paper | Sparse Upcycling Paper |
What is sarashina2-8x70b?
Sarashina2-8x70B is an advanced language model developed by SB Intuitions, featuring over 465 billion parameters. It's built using the innovative sparse upcycling technique applied to the base Sarashina2-70B model, creating an efficient Mixture-of-Experts architecture. The model is trained on a comprehensive mix of Japanese and English web corpora, making it particularly powerful for bilingual applications.
Implementation Details
The model employs a sentencepiece tokenizer with unigram language modeling and byte-fallback capability, allowing direct processing of raw sentences without pre-tokenization for Japanese text. It requires substantial computational resources for inference, specifically either 16x H100 or 16x A100 80GB GPUs.
- Specialized tokenization without Japanese pre-tokenization requirement
- BF16 precision for efficient computation
- Built using sparse upcycling methodology
- Mixture-of-Experts architecture for enhanced performance
Core Capabilities
- Bilingual processing in Japanese and English
- Raw text processing without pre-tokenization
- Large-scale language understanding and generation
- Efficient parameter utilization through MoE architecture
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its massive 465B parameter count achieved through sparse upcycling, combined with efficient bilingual capabilities for Japanese and English processing.
Q: What are the recommended use cases?
While the model shows promising capabilities, it's important to note that it hasn't been instruction-tuned yet. Users should consider fine-tuning it for specific applications while incorporating safety considerations and human preferences.