OLMo-2-1124-7B-DPO

Maintained By
allenai

OLMo-2-1124-7B-DPO

PropertyValue
Base ModelOLMo-2-7B-SFT
LicenseApache 2.0
LanguageEnglish
PaperForthcoming
Training MethodDirect Preference Optimization (DPO)

What is OLMo-2-1124-7B-DPO?

OLMo-2-1124-7B-DPO is a sophisticated language model developed by Allen Institute for AI (AI2) as part of their OLMo (Open Language Model) series. It's a DPO-trained variant of the base OLMo 2 7B model, specifically optimized using preference data to enhance its instruction-following capabilities and performance across diverse tasks.

Implementation Details

The model employs a length-normalized DPO training approach with carefully tuned hyperparameters including a learning rate of 8E-7, beta value of 5, and an effective batch size of 128. It supports a maximum sequence length of 2048 tokens and utilizes a linear learning rate schedule with a 0.1 warmup ratio.

  • Built on the Transformers architecture
  • Trained on OLMo-specific variant of Tülu 3 dataset
  • Implements a standardized chat template for consistent interaction
  • Supports both general text generation and specialized tasks

Core Capabilities

  • Strong performance on mathematical reasoning (GSM8k: 82.4%)
  • Robust safety measures (Safety score: 81.5%)
  • Effective instruction following (AlpacaEval: 29.9%)
  • Balanced performance across diverse tasks including DROP and MMLU

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for being fully open-source while achieving competitive performance against other leading models in its size class. It's particularly notable for its balanced performance across various tasks and its transparent training process.

Q: What are the recommended use cases?

The model is well-suited for research and educational applications, particularly excelling in mathematical reasoning, instruction following, and general text generation tasks. It's designed to be a versatile tool while maintaining strong safety considerations.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.