Sarashina2-8x70B

Property	Value
Parameter Count	465B
Tensor Type	BF16
License	Sarashina Model NonCommercial License
Research Paper	Sparse Upcycling Paper

What is sarashina2-8x70b?

Sarashina2-8x70B is an advanced language model developed by SB Intuitions, featuring over 465 billion parameters. It's built using the innovative sparse upcycling technique applied to the base Sarashina2-70B model, creating an efficient Mixture-of-Experts architecture. The model is trained on a comprehensive mix of Japanese and English web corpora, making it particularly powerful for bilingual applications.

Implementation Details

The model employs a sentencepiece tokenizer with unigram language modeling and byte-fallback capability, allowing direct processing of raw sentences without pre-tokenization for Japanese text. It requires substantial computational resources for inference, specifically either 16x H100 or 16x A100 80GB GPUs.

Specialized tokenization without Japanese pre-tokenization requirement
BF16 precision for efficient computation
Built using sparse upcycling methodology
Mixture-of-Experts architecture for enhanced performance

Core Capabilities

Bilingual processing in Japanese and English
Raw text processing without pre-tokenization
Large-scale language understanding and generation
Efficient parameter utilization through MoE architecture

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its massive 465B parameter count achieved through sparse upcycling, combined with efficient bilingual capabilities for Japanese and English processing.

Q: What are the recommended use cases?

While the model shows promising capabilities, it's important to note that it hasn't been instruction-tuned yet. Users should consider fine-tuning it for specific applications while incorporating safety considerations and human preferences.