OLMo-2-1124-7B-DPO
Property | Value |
---|---|
Base Model | OLMo-2-7B-SFT |
License | Apache 2.0 |
Paper | Tülu 3 Paper |
Primary Language | English |
What is OLMo-2-1124-7B-DPO?
OLMo-2-1124-7B-DPO is an advanced language model developed by Allen Institute for AI (AI2) as part of their OLMo (Open Language Model) series. This model represents a significant advancement in open-source AI, having undergone Direct Preference Optimization (DPO) training after initial supervised fine-tuning on the Tülu 3 dataset.
Implementation Details
The model utilizes a length-normalized DPO training approach with specific hyperparameters including a learning rate of 8E-7, beta value of 5, and effective batch size of 128. It supports a maximum sequence length of 2048 tokens and employs a linear learning rate schedule with 0.1 warmup ratio.
- Trained on a custom preference dataset mix
- Implements chat template with user/assistant format
- Supports standard HuggingFace transformers pipeline
- Optimized for both conversational and analytical tasks
Core Capabilities
- Strong performance on mathematical reasoning (GSM8k: 82.4%)
- High safety scores (81.5% on safety benchmarks)
- Effective on general knowledge tasks (MMLU: 63.4%)
- Improved chat capabilities through DPO training
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its fully open nature and strong performance across diverse tasks, particularly excelling in mathematical reasoning and safety aspects while maintaining competitive scores in general knowledge tasks.
Q: What are the recommended use cases?
This model is particularly well-suited for research and educational applications, especially in scenarios requiring mathematical reasoning, structured problem-solving, and safe conversational interactions. It's designed to handle both chat-based interactions and specific task-oriented applications.