OLMo-2-1124-7B-DPO

Maintained By
allenai

OLMo-2-1124-7B-DPO

PropertyValue
Base ModelOLMo-2-7B-SFT
LicenseApache 2.0
PaperTülu 3 Paper
Primary LanguageEnglish

What is OLMo-2-1124-7B-DPO?

OLMo-2-1124-7B-DPO is an advanced language model developed by Allen Institute for AI (AI2) as part of their OLMo (Open Language Model) series. This model represents a significant advancement in open-source AI, having undergone Direct Preference Optimization (DPO) training after initial supervised fine-tuning on the Tülu 3 dataset.

Implementation Details

The model utilizes a length-normalized DPO training approach with specific hyperparameters including a learning rate of 8E-7, beta value of 5, and effective batch size of 128. It supports a maximum sequence length of 2048 tokens and employs a linear learning rate schedule with 0.1 warmup ratio.

  • Trained on a custom preference dataset mix
  • Implements chat template with user/assistant format
  • Supports standard HuggingFace transformers pipeline
  • Optimized for both conversational and analytical tasks

Core Capabilities

  • Strong performance on mathematical reasoning (GSM8k: 82.4%)
  • High safety scores (81.5% on safety benchmarks)
  • Effective on general knowledge tasks (MMLU: 63.4%)
  • Improved chat capabilities through DPO training

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its fully open nature and strong performance across diverse tasks, particularly excelling in mathematical reasoning and safety aspects while maintaining competitive scores in general knowledge tasks.

Q: What are the recommended use cases?

This model is particularly well-suited for research and educational applications, especially in scenarios requiring mathematical reasoning, structured problem-solving, and safe conversational interactions. It's designed to handle both chat-based interactions and specific task-oriented applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.