OLMo-2-1124-7B-DPO

Property	Value
Base Model	OLMo-2-7B-SFT
License	Apache 2.0
Paper	Tülu 3 Paper
Primary Language	English

What is OLMo-2-1124-7B-DPO?

OLMo-2-1124-7B-DPO is an advanced language model developed by Allen Institute for AI (AI2) as part of their OLMo (Open Language Model) series. This model represents a significant advancement in open-source AI, having undergone Direct Preference Optimization (DPO) training after initial supervised fine-tuning on the Tülu 3 dataset.

Implementation Details

The model utilizes a length-normalized DPO training approach with specific hyperparameters including a learning rate of 8E-7, beta value of 5, and effective batch size of 128. It supports a maximum sequence length of 2048 tokens and employs a linear learning rate schedule with 0.1 warmup ratio.

Trained on a custom preference dataset mix
Implements chat template with user/assistant format
Supports standard HuggingFace transformers pipeline
Optimized for both conversational and analytical tasks

Core Capabilities

Strong performance on mathematical reasoning (GSM8k: 82.4%)
High safety scores (81.5% on safety benchmarks)
Effective on general knowledge tasks (MMLU: 63.4%)
Improved chat capabilities through DPO training

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its fully open nature and strong performance across diverse tasks, particularly excelling in mathematical reasoning and safety aspects while maintaining competitive scores in general knowledge tasks.

Q: What are the recommended use cases?

This model is particularly well-suited for research and educational applications, especially in scenarios requiring mathematical reasoning, structured problem-solving, and safe conversational interactions. It's designed to handle both chat-based interactions and specific task-oriented applications.

OLMo-2-1124-7B-DPO

OLMo-2-1124-7B-DPO

What is OLMo-2-1124-7B-DPO?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models