Llama-3.1-Swallow-8B-Instruct-v0.3

Property	Value
Parameter Count	8 Billion
Base Model	Meta Llama 3.1
License	META LLAMA 3.1 COMMUNITY LICENSE
Languages	Japanese, English
Model Hub	Hugging Face

What is Llama-3.1-Swallow-8B-Instruct-v0.3?

Llama-3.1-Swallow-8B-Instruct-v0.3 is an advanced language model that enhances the Japanese language capabilities of Meta's Llama 3.1 while maintaining strong English performance. This model represents the latest iteration in the Swallow series, trained on approximately 200 billion tokens from Japanese web corpus, Wikipedia articles, and specialized content.

Implementation Details

The model was developed through a two-stage process: continual pre-training on the base Llama 3.1 model, followed by supervised fine-tuning using carefully curated Japanese instruction datasets. It leverages the Megatron-LM library and demonstrates state-of-the-art performance for 8B parameter models on Japanese MT-Bench.

Achieves 0.6424 score on Japanese MT-Bench, outperforming previous versions by 8.4 points
Incorporates specialized instruction tuning with Gemma-2-LMSYS-Chat-1M-Synth and Swallow-Magpie-Ultra datasets
Optimized for both single-turn and multi-turn conversations in Japanese

Core Capabilities

Superior Japanese language understanding and generation
Strong performance in Japanese tasks including question answering, summarization, and code generation
Maintained English language capabilities across various benchmarks
Enhanced conversation abilities with detailed and helpful responses

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines enhanced Japanese language capabilities with maintained English performance, achieving SOTA results for 8B parameter models on Japanese benchmarks while preserving strong performance on English tasks.

Q: What are the recommended use cases?

The model excels in Japanese-English bilingual applications, including question answering, summarization, code generation, and conversational AI. It's particularly effective for tasks requiring detailed Japanese language understanding and generation.