OLMo-2-0325-32B-Instruct

Maintained By
allenai

OLMo-2-0325-32B-Instruct

PropertyValue
Parameter Count32 Billion
LicenseApache 2.0
PaperarXiv:2501.00656
Base ModelOLMo-2-0325-32B-DPO

What is OLMo-2-0325-32B-Instruct?

OLMo-2-0325-32B-Instruct is an advanced language model developed by Allen Institute for AI, representing a significant evolution in open-source AI. This model is a post-trained variant of OLMo-2 32B, specifically fine-tuned using supervised learning on the Tülu 3 dataset, followed by DPO and RLVR training phases.

Implementation Details

The model leverages a sophisticated training approach combining multiple techniques: supervised fine-tuning, Direct Preference Optimization (DPO), and Reinforcement Learning from Value Rewards (RLVR). It achieves impressive performance scores across various benchmarks, including 87.6% on GSM8K and 85.6% on IFEval.

  • Trained using 5 8xH100 nodes
  • Implements a specific chat template format
  • Supports intermediate checkpoint access during RLVR training
  • Uses a default system prompt identifying as OLMo 2

Core Capabilities

  • Strong performance on mathematical reasoning (GSM8K, MATH)
  • Advanced instruction following and chat capabilities
  • High safety scores (85.9% on safety benchmarks)
  • Competitive performance against other leading models in diverse tasks

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its fully open nature, comprehensive documentation, and strong performance across diverse tasks. It's particularly notable for its transparency in training procedures and intermediate checkpoints availability.

Q: What are the recommended use cases?

The model excels in mathematical reasoning, instruction following, and general language tasks. It's particularly suitable for research and educational applications, though users should be aware of its limitations regarding safety-critical applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.