OLMo-2-1124-7B-DPO

Property	Value
Base Model	OLMo-2-7B-SFT
License	Apache 2.0
Language	English
Paper	Forthcoming
Training Method	Direct Preference Optimization (DPO)

What is OLMo-2-1124-7B-DPO?

OLMo-2-1124-7B-DPO is a sophisticated language model developed by Allen Institute for AI (AI2) as part of their OLMo (Open Language Model) series. It's a DPO-trained variant of the base OLMo 2 7B model, specifically optimized using preference data to enhance its instruction-following capabilities and performance across diverse tasks.

Implementation Details

The model employs a length-normalized DPO training approach with carefully tuned hyperparameters including a learning rate of 8E-7, beta value of 5, and an effective batch size of 128. It supports a maximum sequence length of 2048 tokens and utilizes a linear learning rate schedule with a 0.1 warmup ratio.

Built on the Transformers architecture
Trained on OLMo-specific variant of Tülu 3 dataset
Implements a standardized chat template for consistent interaction
Supports both general text generation and specialized tasks

Core Capabilities

Strong performance on mathematical reasoning (GSM8k: 82.4%)
Robust safety measures (Safety score: 81.5%)
Effective instruction following (AlpacaEval: 29.9%)
Balanced performance across diverse tasks including DROP and MMLU

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for being fully open-source while achieving competitive performance against other leading models in its size class. It's particularly notable for its balanced performance across various tasks and its transparent training process.

Q: What are the recommended use cases?

The model is well-suited for research and educational applications, particularly excelling in mathematical reasoning, instruction following, and general text generation tasks. It's designed to be a versatile tool while maintaining strong safety considerations.

OLMo-2-1124-7B-DPO

OLMo-2-1124-7B-DPO

What is OLMo-2-1124-7B-DPO?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models