suzume-llama-3-8B-multilingual

Maintained By
lightblue

Suzume LLaMA-3 8B Multilingual

PropertyValue
Parameter Count8.03B
Base ModelMeta-LLaMA-3-8B-Instruct
LicenseLLaMA-3
PaperarXiv:2405.12612
Training Data Size~83,000 conversations

What is suzume-llama-3-8B-multilingual?

Suzume is an advanced multilingual language model built on LLaMA-3's 8B architecture, specifically fine-tuned to maintain high performance across multiple languages while preserving the base model's English capabilities. The model was trained on approximately 83,000 multilingual conversations, making it particularly effective for non-English interactions.

Implementation Details

The model was trained using 4 A100 (80GB) GPUs for 2.5 hours, utilizing advanced techniques like gradient checkpointing and flash attention. It implements a cosine learning rate scheduler with a 1e-5 learning rate and employs 8-bit Adam optimization.

  • Trained on diverse datasets including tagengo-gpt4, instruction_ja, and openchat_sharegpt4_dataset
  • Uses 8192 sequence length with sample packing
  • Implements BF16 precision training

Core Capabilities

  • Multilingual chat support across German, French, Japanese, Russian, and Chinese
  • Competitive MT-Bench scores (7.26-8.19) across different languages
  • Maintains strong English performance (7.73 MT-Bench score)
  • Efficient inference through vLLM integration

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines LLaMA-3's powerful architecture with multilingual capabilities, achieving near-native performance across multiple languages while maintaining the base model's English proficiency. It outperforms other multilingual models in its size class on various benchmarks.

Q: What are the recommended use cases?

The model excels in multilingual conversational tasks, making it ideal for chatbots, customer service applications, and general-purpose language understanding across multiple languages. It's particularly strong in German, French, Japanese, Russian, and Chinese interactions.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.