RakutenAI-7B-instruct

Rakuten

A 7B parameter bilingual Japanese-English instruction-tuned LLM based on Mistral architecture, achieving SOTA performance on Japanese benchmarks

Property	Value
Parameters	7B
Architecture	Mistral-based
License	Apache License 2.0
Languages	Japanese, English
Paper	arXiv:2403.15484

What is RakutenAI-7B-instruct?

RakutenAI-7B-instruct is a state-of-the-art bilingual language model that excels in both Japanese and English language understanding. Built on the Mistral architecture, it features an expanded vocabulary of 48k tokens (increased from 32k) to better handle Japanese character encoding. The model achieves the highest scores on Japanese language benchmarks while maintaining competitive performance on English tasks.

Implementation Details

The model is built upon Mistral-7B-v0.1 pre-trained checkpoint and has been fine-tuned using a diverse mix of datasets including JSNLI, RTE, KUCI, BELEBELE, JCS, JNLI, Dolly-15K, and OpenAssistant1. The instruction-tuning process has been carefully designed to maintain strong performance across both Japanese and English tasks.

Extended vocabulary optimization for Japanese text
Comprehensive instruction-tuning on diverse datasets
Achieved highest average score on Japanese LM-Harness metrics
Superior performance on English benchmarks compared to similar models

Core Capabilities

93.03% accuracy on JCS benchmark
90.39% accuracy on JNLI tasks
96.00% accuracy on MARC-ja
Strong performance on English tasks including ARC (58.62%) and HellaSwag (82.70%)
Balanced bilingual capabilities with state-of-the-art Japanese performance

Frequently Asked Questions

Q: What makes this model unique?

RakutenAI-7B-instruct stands out for its exceptional performance on Japanese language tasks while maintaining strong English capabilities. It achieves this through an expanded vocabulary and carefully curated instruction-tuning process, making it particularly effective for bilingual applications.

Q: What are the recommended use cases?

The model is well-suited for bilingual applications requiring strong Japanese and English language understanding, including text generation, question answering, and general language understanding tasks. It's particularly effective for Japanese language processing while maintaining competitive English language capabilities.