Suzume LLaMA-3 8B Multilingual ORPO Borda Half
Property | Value |
---|---|
Base Model | LLaMA-3 8B |
Training Method | ORPO (Optimized Ranked Preference Optimization) |
License | Non-commercial |
Languages | English, Chinese, French, German, Japanese, Russian |
What is suzume-llama-3-8B-multilingual-orpo-borda-half?
This model is an advanced multilingual variant of LLaMA-3 8B, fine-tuned using ORPO methodology on the Mitsu dataset. It specifically utilizes the top 50% most consistently ranked responses for training, achieving impressive performance across six languages. The model shows notable improvements over its base version and competes favorably with models like Starling-LM-7B-beta and GPT-3.5-turbo in multilingual capabilities.
Implementation Details
The model was trained using the Axolotl framework with specific optimizations including gradient checkpointing, 8-bit Adam optimizer, and cosine learning rate scheduling. Training utilized a sequence length of 8192 and employed flash attention for improved efficiency.
- Learning rate: 8e-6 with cosine scheduling
- Gradient accumulation steps: 8
- Training epochs: 1
- Validation loss: 0.0935
- BF16 mixed precision training
Core Capabilities
- Strong multilingual performance with MT-Bench scores exceeding 7.5 in most languages
- Particularly impressive performance in Russian (8.94) and English (7.98)
- Balanced response generation across different linguistic contexts
- Efficient handling of context-heavy prompts
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its training approach using ORPO on carefully selected high-quality responses, specifically utilizing the 50% most consistently ranked responses from the Mitsu dataset. This selective training strategy has resulted in superior performance across multiple languages.
Q: What are the recommended use cases?
The model is particularly well-suited for multilingual applications requiring high-quality response generation. However, due to its non-commercial license (inherited from Command R/R+ training data), it's limited to research and non-commercial applications.