OLMo-2-1124-13B-SFT

Property	Value
Base Model	OLMo-2-13B-1124
License	Apache 2.0
Language	English
Training Data	Tülu 3 Dataset
Paper	Available Here

What is OLMo-2-1124-13B-SFT?

OLMo-2-1124-13B-SFT is a sophisticated language model developed by Allen AI as part of their Open Language Model (OLMo) series. It represents a supervised fine-tuned version of the base OLMo-2-13B model, specifically optimized using the Tülu 3 dataset to enhance performance across diverse tasks.

Implementation Details

The model utilizes advanced hyperparameters including a learning rate of 7.5E-06, an effective batch size of 128, and a maximum sequence length of 4096. Training was conducted over 2 epochs with a linear learning rate schedule and a warmup ratio of 0.03.

Implements a specific chat template with user/assistant markers
Supports standard HuggingFace integration
Includes built-in tokenizer functionality
Optimized for 4096 token context window

Core Capabilities

Strong performance on mathematical reasoning (87.4% on GSM8k)
Enhanced safety features (77.5% on safety benchmarks)
Robust MMLU performance (68.6%)
Specialized in text generation and conversational tasks
Comprehensive instruction following abilities

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its fully open nature and impressive performance across diverse tasks, particularly in mathematics and reasoning. It achieves state-of-the-art results among open-source models in several benchmarks while maintaining transparency in its training process.

Q: What are the recommended use cases?

The model excels in mathematical reasoning, general knowledge tasks, and conversational applications. It's particularly well-suited for research and educational purposes, though users should be aware of its limitations regarding safety and bias handling.