OLMo-7B
Property | Value |
---|---|
Parameter Count | 6.89B |
Training Tokens | 2.5 Trillion |
License | Apache 2.0 |
Paper | Research Paper |
Authors | Allen Institute for AI (AI2) |
What is OLMo-7B?
OLMo-7B is part of the Open Language Model (OLMo) series designed to advance the science of language models. Trained on the Dolma dataset, it represents a significant step in open-source AI development with its 6.89B parameters and innovative architecture featuring 32 layers and 4096 hidden dimensions.
Implementation Details
The model implements a sophisticated architecture with 32 attention heads, employing non-parametric LayerNorm and RoPE positional embeddings. It utilizes the SwiGLU activation function and supports a context length of 2048 tokens.
- Full attention mechanism without bias terms
- Sequential block type architecture
- Trained with AdamW optimizer (LR: 3.0E-4)
- Batch size of 2160 instances (~4M tokens)
Core Capabilities
- Strong performance on core NLP tasks (71.6% average on core tasks)
- Competitive results on ARC, COPA, and PIQA benchmarks
- Efficient text generation and completion
- Support for both inference and fine-tuning
Frequently Asked Questions
Q: What makes this model unique?
OLMo-7B stands out for its complete transparency in training process, architecture, and evaluation metrics. It's designed specifically for research advancement with all code, checkpoints, and logs being openly available.
Q: What are the recommended use cases?
The model excels in language modeling tasks, research applications, and can be fine-tuned for specific downstream tasks. It's particularly suitable for academic research and applications requiring transparent, reproducible results.