Llama-3.1-Swallow-8B-Instruct-v0.3
Property | Value |
---|---|
Parameter Count | 8 Billion |
Base Model | Meta Llama 3.1 |
License | META LLAMA 3.1 COMMUNITY LICENSE |
Languages | Japanese, English |
Model Hub | Hugging Face |
What is Llama-3.1-Swallow-8B-Instruct-v0.3?
Llama-3.1-Swallow-8B-Instruct-v0.3 is an advanced language model that enhances the Japanese language capabilities of Meta's Llama 3.1 while maintaining strong English performance. This model represents the latest iteration in the Swallow series, trained on approximately 200 billion tokens from Japanese web corpus, Wikipedia articles, and specialized content.
Implementation Details
The model was developed through a two-stage process: continual pre-training on the base Llama 3.1 model, followed by supervised fine-tuning using carefully curated Japanese instruction datasets. It leverages the Megatron-LM library and demonstrates state-of-the-art performance for 8B parameter models on Japanese MT-Bench.
- Achieves 0.6424 score on Japanese MT-Bench, outperforming previous versions by 8.4 points
- Incorporates specialized instruction tuning with Gemma-2-LMSYS-Chat-1M-Synth and Swallow-Magpie-Ultra datasets
- Optimized for both single-turn and multi-turn conversations in Japanese
Core Capabilities
- Superior Japanese language understanding and generation
- Strong performance in Japanese tasks including question answering, summarization, and code generation
- Maintained English language capabilities across various benchmarks
- Enhanced conversation abilities with detailed and helpful responses
Frequently Asked Questions
Q: What makes this model unique?
This model uniquely combines enhanced Japanese language capabilities with maintained English performance, achieving SOTA results for 8B parameter models on Japanese benchmarks while preserving strong performance on English tasks.
Q: What are the recommended use cases?
The model excels in Japanese-English bilingual applications, including question answering, summarization, code generation, and conversational AI. It's particularly effective for tasks requiring detailed Japanese language understanding and generation.