SFR-Iterative-DPO-LLaMA-3-8B-R

SFR-Iterative-DPO-LLaMA-3-8B-R

TriAiExperiments

State-of-the-art 8B parameter LLaMA-3-based model using iterative DPO training, outperforming larger models on key benchmarks including MT-Bench and Alpaca-Eval-V2

PropertyValue
Parameter Count8B
Base ArchitectureLLaMA-3
Training MethodIterative DPO
Licensecc-by-nc-nd-3.0
Model HubHugging Face

What is SFR-Iterative-DPO-LLaMA-3-8B-R?

SFR-Iterative-DPO-LLaMA-3-8B-R represents a significant advancement in instruction-tuned language models, developed through an innovative online RLHF (Reinforcement Learning from Human Feedback) approach. This model achieves remarkable performance, surpassing not only similarly-sized models but also many larger open-source alternatives and some proprietary models like GPT-3.5-turbo-0613.

Implementation Details

The model employs a novel DPO-based training recipe that's more efficient and simpler to implement compared to traditional PPO-based approaches. Its online component effectively addresses distribution shifts during policy optimization, resulting in superior performance across multiple benchmarks.

  • Achieves 37.2 on Alpaca-Eval-V2 (significantly higher than baseline)
  • Scores 8.46 on MT-Bench, outperforming models like Mixtral-8x7B-it
  • Shows strong performance in academic benchmarks including GSM-8K (80.7%) and MMLU (65.3%)

Core Capabilities

  • Advanced instruction following and chat capabilities
  • Strong performance in mathematical reasoning (GSM-8K)
  • Improved truthfulness compared to baseline models
  • Efficient deployment using Hugging Face Transformers library

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its iterative DPO training approach, which achieves state-of-the-art performance with just 8B parameters, demonstrating that smaller models can be highly competitive when trained effectively.

Q: What are the recommended use cases?

The model is well-suited for instruction-following tasks, chat applications, mathematical reasoning, and general knowledge queries. However, users should be aware of potential limitations regarding generating potentially offensive or unethical content under adversarial conditions.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026