Llama-3-Instruct-8B-SPPO-Iter3
Property | Value |
---|---|
Parameter Count | 8.03B |
License | Apache-2.0 |
Base Model | meta-llama/Meta-Llama-3-8B-Instruct |
Research Paper | Self-Play Preference Optimization |
What is Llama-3-Instruct-8B-SPPO-Iter3?
This is an advanced language model developed by UCLA-AGI using Self-Play Preference Optimization (SPPO) methodology. It represents the third iteration of improvements on the Meta-Llama-3-8B-Instruct base model, trained using the UltraFeedback dataset for enhanced instruction-following capabilities.
Implementation Details
The model utilizes a sophisticated training approach with specific hyperparameters including a learning rate of 5e-07, RMSProp optimizer, and linear learning rate scheduling. Training was conducted across 8 devices using DeepSpeed ZeRO-3 optimization.
- Trained on synthetic datasets derived from openbmb/UltraFeedback
- Implements three-iteration SPPO methodology
- Uses BF16 tensor type for efficient computation
Core Capabilities
- Achieves 68.28% accuracy on IFEval (0-Shot)
- Shows 29.74% normalized accuracy on BBH (3-Shot)
- Demonstrates consistent improvement over previous iterations with 39.85% win rate on AlpacaEval
- Performs well on multiple benchmarks including arc_challenge (65.19%) and hellaswag (80.86%)
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness lies in its iterative SPPO training approach, showing progressive improvements across three iterations, particularly in instruction-following tasks and general language understanding.
Q: What are the recommended use cases?
This model is particularly well-suited for instruction-following tasks, general text generation, and applications requiring strong language understanding capabilities in English. It performs especially well in scenarios requiring precise following of instructions.