suzume-llama-3-8B-multilingual-orpo-borda-half

Maintained By
lightblue

Suzume LLaMA-3 8B Multilingual ORPO Borda Half

PropertyValue
Base ModelLLaMA-3 8B
Training MethodORPO (Optimized Ranked Preference Optimization)
LicenseNon-commercial
LanguagesEnglish, Chinese, French, German, Japanese, Russian

What is suzume-llama-3-8B-multilingual-orpo-borda-half?

This model is an advanced multilingual variant of LLaMA-3 8B, fine-tuned using ORPO methodology on the Mitsu dataset. It specifically utilizes the top 50% most consistently ranked responses for training, achieving impressive performance across six languages. The model shows notable improvements over its base version and competes favorably with models like Starling-LM-7B-beta and GPT-3.5-turbo in multilingual capabilities.

Implementation Details

The model was trained using the Axolotl framework with specific optimizations including gradient checkpointing, 8-bit Adam optimizer, and cosine learning rate scheduling. Training utilized a sequence length of 8192 and employed flash attention for improved efficiency.

  • Learning rate: 8e-6 with cosine scheduling
  • Gradient accumulation steps: 8
  • Training epochs: 1
  • Validation loss: 0.0935
  • BF16 mixed precision training

Core Capabilities

  • Strong multilingual performance with MT-Bench scores exceeding 7.5 in most languages
  • Particularly impressive performance in Russian (8.94) and English (7.98)
  • Balanced response generation across different linguistic contexts
  • Efficient handling of context-heavy prompts

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its training approach using ORPO on carefully selected high-quality responses, specifically utilizing the 50% most consistently ranked responses from the Mitsu dataset. This selective training strategy has resulted in superior performance across multiple languages.

Q: What are the recommended use cases?

The model is particularly well-suited for multilingual applications requiring high-quality response generation. However, due to its non-commercial license (inherited from Command R/R+ training data), it's limited to research and non-commercial applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.