suzume-llama-3-8B-multilingual

suzume-llama-3-8B-multilingual

lightblue

Multilingual 8B parameter LLaMA-3 variant optimized for multiple languages including Japanese, German, French, Russian and Chinese, with strong MT-Bench scores.

PropertyValue
Parameter Count8.03B
Base ModelMeta-LLaMA-3-8B-Instruct
LicenseLLaMA-3
PaperarXiv:2405.12612
Training Data Size~83,000 conversations

What is suzume-llama-3-8B-multilingual?

Suzume is an advanced multilingual language model built on LLaMA-3's 8B architecture, specifically fine-tuned to maintain high performance across multiple languages while preserving the base model's English capabilities. The model was trained on approximately 83,000 multilingual conversations, making it particularly effective for non-English interactions.

Implementation Details

The model was trained using 4 A100 (80GB) GPUs for 2.5 hours, utilizing advanced techniques like gradient checkpointing and flash attention. It implements a cosine learning rate scheduler with a 1e-5 learning rate and employs 8-bit Adam optimization.

  • Trained on diverse datasets including tagengo-gpt4, instruction_ja, and openchat_sharegpt4_dataset
  • Uses 8192 sequence length with sample packing
  • Implements BF16 precision training

Core Capabilities

  • Multilingual chat support across German, French, Japanese, Russian, and Chinese
  • Competitive MT-Bench scores (7.26-8.19) across different languages
  • Maintains strong English performance (7.73 MT-Bench score)
  • Efficient inference through vLLM integration

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines LLaMA-3's powerful architecture with multilingual capabilities, achieving near-native performance across multiple languages while maintaining the base model's English proficiency. It outperforms other multilingual models in its size class on various benchmarks.

Q: What are the recommended use cases?

The model excels in multilingual conversational tasks, making it ideal for chatbots, customer service applications, and general-purpose language understanding across multiple languages. It's particularly strong in German, French, Japanese, Russian, and Chinese interactions.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026