Suzume LLaMA-3 8B Multilingual
Property | Value |
---|---|
Parameter Count | 8.03B |
Base Model | Meta-LLaMA-3-8B-Instruct |
License | LLaMA-3 |
Paper | arXiv:2405.12612 |
Training Data Size | ~83,000 conversations |
What is suzume-llama-3-8B-multilingual?
Suzume is an advanced multilingual language model built on LLaMA-3's 8B architecture, specifically fine-tuned to maintain high performance across multiple languages while preserving the base model's English capabilities. The model was trained on approximately 83,000 multilingual conversations, making it particularly effective for non-English interactions.
Implementation Details
The model was trained using 4 A100 (80GB) GPUs for 2.5 hours, utilizing advanced techniques like gradient checkpointing and flash attention. It implements a cosine learning rate scheduler with a 1e-5 learning rate and employs 8-bit Adam optimization.
- Trained on diverse datasets including tagengo-gpt4, instruction_ja, and openchat_sharegpt4_dataset
- Uses 8192 sequence length with sample packing
- Implements BF16 precision training
Core Capabilities
- Multilingual chat support across German, French, Japanese, Russian, and Chinese
- Competitive MT-Bench scores (7.26-8.19) across different languages
- Maintains strong English performance (7.73 MT-Bench score)
- Efficient inference through vLLM integration
Frequently Asked Questions
Q: What makes this model unique?
This model uniquely combines LLaMA-3's powerful architecture with multilingual capabilities, achieving near-native performance across multiple languages while maintaining the base model's English proficiency. It outperforms other multilingual models in its size class on various benchmarks.
Q: What are the recommended use cases?
The model excels in multilingual conversational tasks, making it ideal for chatbots, customer service applications, and general-purpose language understanding across multiple languages. It's particularly strong in German, French, Japanese, Russian, and Chinese interactions.