OLMo-2-0325-32B-Instruct

allenai

OLMo-2-0325-32B-Instruct is a 32B parameter open language model fine-tuned on Tülu 3 dataset, optimized for diverse tasks including MATH and GSM8K.

Property	Value
Parameter Count	32 Billion
License	Apache 2.0
Paper	arXiv:2501.00656
Base Model	OLMo-2-0325-32B-DPO

What is OLMo-2-0325-32B-Instruct?

OLMo-2-0325-32B-Instruct is an advanced language model developed by Allen Institute for AI, representing a significant evolution in open-source AI. This model is a post-trained variant of OLMo-2 32B, specifically fine-tuned using supervised learning on the Tülu 3 dataset, followed by DPO and RLVR training phases.

Implementation Details

The model leverages a sophisticated training approach combining multiple techniques: supervised fine-tuning, Direct Preference Optimization (DPO), and Reinforcement Learning from Value Rewards (RLVR). It achieves impressive performance scores across various benchmarks, including 87.6% on GSM8K and 85.6% on IFEval.

Trained using 5 8xH100 nodes
Implements a specific chat template format
Supports intermediate checkpoint access during RLVR training
Uses a default system prompt identifying as OLMo 2

Core Capabilities

Strong performance on mathematical reasoning (GSM8K, MATH)
Advanced instruction following and chat capabilities
High safety scores (85.9% on safety benchmarks)
Competitive performance against other leading models in diverse tasks

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its fully open nature, comprehensive documentation, and strong performance across diverse tasks. It's particularly notable for its transparency in training procedures and intermediate checkpoints availability.

Q: What are the recommended use cases?

The model excels in mathematical reasoning, instruction following, and general language tasks. It's particularly suitable for research and educational applications, though users should be aware of its limitations regarding safety-critical applications.