OLMo-2-1124-13B-DPO

Property	Value
Base Model	OLMo-2-13B-SFT
License	Apache 2.0
Language	English
Paper	Tülu 3 Paper

What is OLMo-2-1124-13B-DPO?

OLMo-2-1124-13B-DPO is an advanced language model developed by Allen AI, representing a significant evolution in the OLMo series. It's a 13B parameter model that has undergone Direct Preference Optimization (DPO) training after being fine-tuned on a specialized variant of the Tülu 3 dataset. The model achieves impressive performance across diverse tasks, scoring an average of 61.0% across various benchmarks including MMLU, GSM8k, and MATH.

Implementation Details

The model utilizes a length-normalized DPO training approach with specific hyperparameters including a learning rate of 8E-7, beta value of 5, and an effective batch size of 128. It supports a maximum sequence length of 2048 tokens and implements a linear learning rate schedule with a 0.1 warmup ratio.

Trained using Transformers architecture
Implements specialized chat template for conversation
Supports both standard text generation and conversational tasks
Achieves 84.2% on GSM8k and 80.6% on IFEval benchmarks

Core Capabilities

Advanced mathematical reasoning and problem-solving
Strong performance on general knowledge tasks (68.5% on MMLU)
Robust safety considerations (80.6% on safety benchmarks)
Effective chat and instruction following capabilities

Frequently Asked Questions

Q: What makes this model unique?

OLMo-2-1124-13B-DPO stands out for its comprehensive training approach combining SFT and DPO, along with its strong performance across diverse tasks, particularly in mathematics and reasoning. It's part of the fully open OLMo series, making it valuable for research and educational purposes.

Q: What are the recommended use cases?

The model is particularly well-suited for research and educational applications, excelling in mathematical reasoning, general knowledge tasks, and conversational interactions. It's designed for safe and responsible AI deployment while maintaining high performance across various benchmarks.