sarashina2-8x70b

sarashina2-8x70b

sbintuitions

A powerful 465B parameter Japanese-English language model using sparse upcycling technique, requiring 16x H100/A100 GPUs for inference

PropertyValue
Parameter Count465B
Tensor TypeBF16
LicenseSarashina Model NonCommercial License
Research PaperSparse Upcycling Paper

What is sarashina2-8x70b?

Sarashina2-8x70B is an advanced language model developed by SB Intuitions, featuring over 465 billion parameters. It's built using the innovative sparse upcycling technique applied to the base Sarashina2-70B model, creating an efficient Mixture-of-Experts architecture. The model is trained on a comprehensive mix of Japanese and English web corpora, making it particularly powerful for bilingual applications.

Implementation Details

The model employs a sentencepiece tokenizer with unigram language modeling and byte-fallback capability, allowing direct processing of raw sentences without pre-tokenization for Japanese text. It requires substantial computational resources for inference, specifically either 16x H100 or 16x A100 80GB GPUs.

  • Specialized tokenization without Japanese pre-tokenization requirement
  • BF16 precision for efficient computation
  • Built using sparse upcycling methodology
  • Mixture-of-Experts architecture for enhanced performance

Core Capabilities

  • Bilingual processing in Japanese and English
  • Raw text processing without pre-tokenization
  • Large-scale language understanding and generation
  • Efficient parameter utilization through MoE architecture

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its massive 465B parameter count achieved through sparse upcycling, combined with efficient bilingual capabilities for Japanese and English processing.

Q: What are the recommended use cases?

While the model shows promising capabilities, it's important to note that it hasn't been instruction-tuned yet. Users should consider fine-tuning it for specific applications while incorporating safety considerations and human preferences.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026