Llama-3-Instruct-8B-SPPO-Iter3

Maintained By
UCLA-AGI

Llama-3-Instruct-8B-SPPO-Iter3

PropertyValue
Parameter Count8.03B
LicenseApache-2.0
Base Modelmeta-llama/Meta-Llama-3-8B-Instruct
Research PaperSelf-Play Preference Optimization

What is Llama-3-Instruct-8B-SPPO-Iter3?

This is an advanced language model developed by UCLA-AGI using Self-Play Preference Optimization (SPPO) methodology. It represents the third iteration of improvements on the Meta-Llama-3-8B-Instruct base model, trained using the UltraFeedback dataset for enhanced instruction-following capabilities.

Implementation Details

The model utilizes a sophisticated training approach with specific hyperparameters including a learning rate of 5e-07, RMSProp optimizer, and linear learning rate scheduling. Training was conducted across 8 devices using DeepSpeed ZeRO-3 optimization.

  • Trained on synthetic datasets derived from openbmb/UltraFeedback
  • Implements three-iteration SPPO methodology
  • Uses BF16 tensor type for efficient computation

Core Capabilities

  • Achieves 68.28% accuracy on IFEval (0-Shot)
  • Shows 29.74% normalized accuracy on BBH (3-Shot)
  • Demonstrates consistent improvement over previous iterations with 39.85% win rate on AlpacaEval
  • Performs well on multiple benchmarks including arc_challenge (65.19%) and hellaswag (80.86%)

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its iterative SPPO training approach, showing progressive improvements across three iterations, particularly in instruction-following tasks and general language understanding.

Q: What are the recommended use cases?

This model is particularly well-suited for instruction-following tasks, general text generation, and applications requiring strong language understanding capabilities in English. It performs especially well in scenarios requiring precise following of instructions.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.