SFR-Iterative-DPO-LLaMA-3-8B-R

Maintained By
TriAiExperiments

SFR-Iterative-DPO-LLaMA-3-8B-R

PropertyValue
Parameter Count8B
Base ArchitectureLLaMA-3
Training MethodIterative DPO
Licensecc-by-nc-nd-3.0
Model HubHugging Face

What is SFR-Iterative-DPO-LLaMA-3-8B-R?

SFR-Iterative-DPO-LLaMA-3-8B-R represents a significant advancement in instruction-tuned language models, developed through an innovative online RLHF (Reinforcement Learning from Human Feedback) approach. This model achieves remarkable performance, surpassing not only similarly-sized models but also many larger open-source alternatives and some proprietary models like GPT-3.5-turbo-0613.

Implementation Details

The model employs a novel DPO-based training recipe that's more efficient and simpler to implement compared to traditional PPO-based approaches. Its online component effectively addresses distribution shifts during policy optimization, resulting in superior performance across multiple benchmarks.

  • Achieves 37.2 on Alpaca-Eval-V2 (significantly higher than baseline)
  • Scores 8.46 on MT-Bench, outperforming models like Mixtral-8x7B-it
  • Shows strong performance in academic benchmarks including GSM-8K (80.7%) and MMLU (65.3%)

Core Capabilities

  • Advanced instruction following and chat capabilities
  • Strong performance in mathematical reasoning (GSM-8K)
  • Improved truthfulness compared to baseline models
  • Efficient deployment using Hugging Face Transformers library

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its iterative DPO training approach, which achieves state-of-the-art performance with just 8B parameters, demonstrating that smaller models can be highly competitive when trained effectively.

Q: What are the recommended use cases?

The model is well-suited for instruction-following tasks, chat applications, mathematical reasoning, and general knowledge queries. However, users should be aware of potential limitations regarding generating potentially offensive or unethical content under adversarial conditions.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.