OLMo-2-0325-32B
Property | Value |
---|---|
Parameter Count | 32 Billion |
Training Tokens | 6 Trillion |
License | Apache 2.0 |
Research Paper | arXiv:2501.00656 |
Context Length | 4096 tokens |
What is OLMo-2-0325-32B?
OLMo-2-0325-32B is the largest model in the OLMo 2 family, developed by Allen Institute for AI. It's a Transformer-style autoregressive language model trained on the OLMo-mix-1124 dataset with 6 trillion tokens of training data. The model features 64 layers, 5120 hidden size, and 40 attention heads, making it a powerful competitor in the open-source AI landscape.
Implementation Details
The model underwent a sophisticated two-stage training process. Stage 1 involved pretraining on 6 trillion tokens (approximately 1.5 epochs), while Stage 2 included fine-tuning on the Dolmino-Mix-1124 dataset. The final model is a merge of multiple training runs, including three versions trained on 100B tokens and one version on 300B tokens.
- Architecture: 64 layers, 5120 hidden size, 40 attention heads
- Context window: 4096 tokens
- Training FLOPs: 1.3 × 10^24
- Average benchmark score: 72.9 across standard evaluations
Core Capabilities
- Strong performance in complex reasoning tasks (90.4% on ARC/C)
- Robust natural language understanding (89.7% on WinoG)
- Advanced mathematical reasoning (78.8% on GSM8k)
- Competitive question-answering abilities (88.0% on TriviaQA)
Frequently Asked Questions
Q: What makes this model unique?
OLMo-2-0325-32B stands out for its fully open-source nature, comprehensive documentation, and competitive performance against partially open and closed models. It achieves state-of-the-art results among fully open models across multiple benchmarks.
Q: What are the recommended use cases?
The model excels in various tasks including complex reasoning, mathematical problem-solving, and question-answering. It's particularly suitable for research applications and can be fine-tuned for specific tasks using the provided training scripts and documentation.