Llama-3.1-Swallow-8B-Instruct-v0.3

Maintained By
tokyotech-llm

Llama-3.1-Swallow-8B-Instruct-v0.3

PropertyValue
Parameter Count8 Billion
Base ModelMeta Llama 3.1
LicenseMETA LLAMA 3.1 COMMUNITY LICENSE
LanguagesJapanese, English
Model HubHugging Face

What is Llama-3.1-Swallow-8B-Instruct-v0.3?

Llama-3.1-Swallow-8B-Instruct-v0.3 is an advanced language model that enhances the Japanese language capabilities of Meta's Llama 3.1 while maintaining strong English performance. This model represents the latest iteration in the Swallow series, trained on approximately 200 billion tokens from Japanese web corpus, Wikipedia articles, and specialized content.

Implementation Details

The model was developed through a two-stage process: continual pre-training on the base Llama 3.1 model, followed by supervised fine-tuning using carefully curated Japanese instruction datasets. It leverages the Megatron-LM library and demonstrates state-of-the-art performance for 8B parameter models on Japanese MT-Bench.

  • Achieves 0.6424 score on Japanese MT-Bench, outperforming previous versions by 8.4 points
  • Incorporates specialized instruction tuning with Gemma-2-LMSYS-Chat-1M-Synth and Swallow-Magpie-Ultra datasets
  • Optimized for both single-turn and multi-turn conversations in Japanese

Core Capabilities

  • Superior Japanese language understanding and generation
  • Strong performance in Japanese tasks including question answering, summarization, and code generation
  • Maintained English language capabilities across various benchmarks
  • Enhanced conversation abilities with detailed and helpful responses

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines enhanced Japanese language capabilities with maintained English performance, achieving SOTA results for 8B parameter models on Japanese benchmarks while preserving strong performance on English tasks.

Q: What are the recommended use cases?

The model excels in Japanese-English bilingual applications, including question answering, summarization, code generation, and conversational AI. It's particularly effective for tasks requiring detailed Japanese language understanding and generation.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.