OLMo-2-1124-13B-SFT
Property | Value |
---|---|
Base Model | OLMo-2-13B-1124 |
License | Apache 2.0 |
Language | English |
Training Data | Tülu 3 Dataset |
Paper | Available Here |
What is OLMo-2-1124-13B-SFT?
OLMo-2-1124-13B-SFT is a sophisticated language model developed by Allen AI as part of their Open Language Model (OLMo) series. It represents a supervised fine-tuned version of the base OLMo-2-13B model, specifically optimized using the Tülu 3 dataset to enhance performance across diverse tasks.
Implementation Details
The model utilizes advanced hyperparameters including a learning rate of 7.5E-06, an effective batch size of 128, and a maximum sequence length of 4096. Training was conducted over 2 epochs with a linear learning rate schedule and a warmup ratio of 0.03.
- Implements a specific chat template with user/assistant markers
- Supports standard HuggingFace integration
- Includes built-in tokenizer functionality
- Optimized for 4096 token context window
Core Capabilities
- Strong performance on mathematical reasoning (87.4% on GSM8k)
- Enhanced safety features (77.5% on safety benchmarks)
- Robust MMLU performance (68.6%)
- Specialized in text generation and conversational tasks
- Comprehensive instruction following abilities
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its fully open nature and impressive performance across diverse tasks, particularly in mathematics and reasoning. It achieves state-of-the-art results among open-source models in several benchmarks while maintaining transparency in its training process.
Q: What are the recommended use cases?
The model excels in mathematical reasoning, general knowledge tasks, and conversational applications. It's particularly well-suited for research and educational purposes, though users should be aware of its limitations regarding safety and bias handling.