OLMo-2-1124-13B-DPO

Maintained By
allenai

OLMo-2-1124-13B-DPO

PropertyValue
Base ModelOLMo-2-13B-SFT
LicenseApache 2.0
LanguageEnglish
PaperTülu 3 Paper

What is OLMo-2-1124-13B-DPO?

OLMo-2-1124-13B-DPO is an advanced language model developed by Allen AI, representing a significant evolution in the OLMo series. It's a 13B parameter model that has undergone Direct Preference Optimization (DPO) training after being fine-tuned on a specialized variant of the Tülu 3 dataset. The model achieves impressive performance across diverse tasks, scoring an average of 61.0% across various benchmarks including MMLU, GSM8k, and MATH.

Implementation Details

The model utilizes a length-normalized DPO training approach with specific hyperparameters including a learning rate of 8E-7, beta value of 5, and an effective batch size of 128. It supports a maximum sequence length of 2048 tokens and implements a linear learning rate schedule with a 0.1 warmup ratio.

  • Trained using Transformers architecture
  • Implements specialized chat template for conversation
  • Supports both standard text generation and conversational tasks
  • Achieves 84.2% on GSM8k and 80.6% on IFEval benchmarks

Core Capabilities

  • Advanced mathematical reasoning and problem-solving
  • Strong performance on general knowledge tasks (68.5% on MMLU)
  • Robust safety considerations (80.6% on safety benchmarks)
  • Effective chat and instruction following capabilities

Frequently Asked Questions

Q: What makes this model unique?

OLMo-2-1124-13B-DPO stands out for its comprehensive training approach combining SFT and DPO, along with its strong performance across diverse tasks, particularly in mathematics and reasoning. It's part of the fully open OLMo series, making it valuable for research and educational purposes.

Q: What are the recommended use cases?

The model is particularly well-suited for research and educational applications, excelling in mathematical reasoning, general knowledge tasks, and conversational interactions. It's designed for safe and responsible AI deployment while maintaining high performance across various benchmarks.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.