Suzume LLaMA-3 8B Multilingual

Property	Value
Parameter Count	8.03B
Base Model	Meta-LLaMA-3-8B-Instruct
License	LLaMA-3
Paper	arXiv:2405.12612
Training Data Size	~83,000 conversations

What is suzume-llama-3-8B-multilingual?

Suzume is an advanced multilingual language model built on LLaMA-3's 8B architecture, specifically fine-tuned to maintain high performance across multiple languages while preserving the base model's English capabilities. The model was trained on approximately 83,000 multilingual conversations, making it particularly effective for non-English interactions.

Implementation Details

The model was trained using 4 A100 (80GB) GPUs for 2.5 hours, utilizing advanced techniques like gradient checkpointing and flash attention. It implements a cosine learning rate scheduler with a 1e-5 learning rate and employs 8-bit Adam optimization.

Trained on diverse datasets including tagengo-gpt4, instruction_ja, and openchat_sharegpt4_dataset
Uses 8192 sequence length with sample packing
Implements BF16 precision training

Core Capabilities

Multilingual chat support across German, French, Japanese, Russian, and Chinese
Competitive MT-Bench scores (7.26-8.19) across different languages
Maintains strong English performance (7.73 MT-Bench score)
Efficient inference through vLLM integration

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines LLaMA-3's powerful architecture with multilingual capabilities, achieving near-native performance across multiple languages while maintaining the base model's English proficiency. It outperforms other multilingual models in its size class on various benchmarks.

Q: What are the recommended use cases?

The model excels in multilingual conversational tasks, making it ideal for chatbots, customer service applications, and general-purpose language understanding across multiple languages. It's particularly strong in German, French, Japanese, Russian, and Chinese interactions.