OLMo-7B-0724-hf
Property | Value |
---|---|
Parameter Count | 7 Billion |
Training Tokens | 2.75 Trillion |
License | Apache 2.0 |
Context Length | 4096 tokens |
Developer | Allen Institute for AI (AI2) |
What is OLMo-7B-0724-hf?
OLMo-7B-0724-hf is an open-source large language model developed by Allen Institute for AI as part of their initiative to advance the science of language models. Built on a 32-layer transformer architecture with 4096 hidden size and 32 attention heads, this model represents a significant advancement in open-source AI research.
Implementation Details
The model employs a sophisticated two-stage training approach: initial training on the Dolma 1.7 dataset with a cosine learning rate schedule, followed by fine-tuning on a high-quality subset. It utilizes non-parametric LayerNorm, RoPE positional embeddings, and full attention without biases.
- Architecture: 32 layers, 4096 hidden size, 32 attention heads
- Training: 2.75 trillion tokens on Dolma dataset
- Optimizer: AdamW with peak learning rate of 3.0E-4
- Context window: 4096 tokens
Core Capabilities
- Strong performance on GSM8k (35% accuracy)
- Competitive MMLU performance (53.4%)
- Excellent scientific QA capabilities (97% on SciQ)
- Robust general reasoning and comprehension
Frequently Asked Questions
Q: What makes this model unique?
OLMo-7B stands out for its completely open approach to model development, with all training code, checkpoints, and logs being publicly available. Its two-stage training process and focus on scientific tasks make it particularly valuable for research applications.
Q: What are the recommended use cases?
The model excels in scientific reasoning, mathematical problem-solving, and general language understanding tasks. It's particularly suitable for research applications and as a foundation for further fine-tuning in specialized domains.