OLMo-2-0325-32B

Property	Value
Parameter Count	32 Billion
Training Tokens	6 Trillion
License	Apache 2.0
Research Paper	arXiv:2501.00656
Context Length	4096 tokens

What is OLMo-2-0325-32B?

OLMo-2-0325-32B is the largest model in the OLMo 2 family, developed by Allen Institute for AI. It's a Transformer-style autoregressive language model trained on the OLMo-mix-1124 dataset with 6 trillion tokens of training data. The model features 64 layers, 5120 hidden size, and 40 attention heads, making it a powerful competitor in the open-source AI landscape.

Implementation Details

The model underwent a sophisticated two-stage training process. Stage 1 involved pretraining on 6 trillion tokens (approximately 1.5 epochs), while Stage 2 included fine-tuning on the Dolmino-Mix-1124 dataset. The final model is a merge of multiple training runs, including three versions trained on 100B tokens and one version on 300B tokens.

Architecture: 64 layers, 5120 hidden size, 40 attention heads
Context window: 4096 tokens
Training FLOPs: 1.3 × 10^24
Average benchmark score: 72.9 across standard evaluations

Core Capabilities

Strong performance in complex reasoning tasks (90.4% on ARC/C)
Robust natural language understanding (89.7% on WinoG)
Advanced mathematical reasoning (78.8% on GSM8k)
Competitive question-answering abilities (88.0% on TriviaQA)

Frequently Asked Questions

Q: What makes this model unique?

OLMo-2-0325-32B stands out for its fully open-source nature, comprehensive documentation, and competitive performance against partially open and closed models. It achieves state-of-the-art results among fully open models across multiple benchmarks.

Q: What are the recommended use cases?

The model excels in various tasks including complex reasoning, mathematical problem-solving, and question-answering. It's particularly suitable for research applications and can be fine-tuned for specific tasks using the provided training scripts and documentation.

OLMo-2-0325-32B

OLMo-2-0325-32B

What is OLMo-2-0325-32B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models