OLMo-2-1124-13B-DPO
Property | Value |
---|---|
Base Model | OLMo-2-13B-SFT |
License | Apache 2.0 |
Language | English |
Paper | Tülu 3 Paper |
What is OLMo-2-1124-13B-DPO?
OLMo-2-1124-13B-DPO is an advanced language model developed by Allen AI, representing a significant evolution in the OLMo series. It's a 13B parameter model that has undergone Direct Preference Optimization (DPO) training after being fine-tuned on a specialized variant of the Tülu 3 dataset. The model achieves impressive performance across diverse tasks, scoring an average of 61.0% across various benchmarks including MMLU, GSM8k, and MATH.
Implementation Details
The model utilizes a length-normalized DPO training approach with specific hyperparameters including a learning rate of 8E-7, beta value of 5, and an effective batch size of 128. It supports a maximum sequence length of 2048 tokens and implements a linear learning rate schedule with a 0.1 warmup ratio.
- Trained using Transformers architecture
- Implements specialized chat template for conversation
- Supports both standard text generation and conversational tasks
- Achieves 84.2% on GSM8k and 80.6% on IFEval benchmarks
Core Capabilities
- Advanced mathematical reasoning and problem-solving
- Strong performance on general knowledge tasks (68.5% on MMLU)
- Robust safety considerations (80.6% on safety benchmarks)
- Effective chat and instruction following capabilities
Frequently Asked Questions
Q: What makes this model unique?
OLMo-2-1124-13B-DPO stands out for its comprehensive training approach combining SFT and DPO, along with its strong performance across diverse tasks, particularly in mathematics and reasoning. It's part of the fully open OLMo series, making it valuable for research and educational purposes.
Q: What are the recommended use cases?
The model is particularly well-suited for research and educational applications, excelling in mathematical reasoning, general knowledge tasks, and conversational interactions. It's designed for safe and responsible AI deployment while maintaining high performance across various benchmarks.