OLMo-7B

allenai

OLMo-7B is a 6.89B parameter open language model trained on 2.5T tokens, featuring 32 layers and 4096 hidden size for research advancement

Property	Value
Parameter Count	6.89B
Training Tokens	2.5 Trillion
License	Apache 2.0
Paper	Research Paper
Authors	Allen Institute for AI (AI2)

What is OLMo-7B?

OLMo-7B is part of the Open Language Model (OLMo) series designed to advance the science of language models. Trained on the Dolma dataset, it represents a significant step in open-source AI development with its 6.89B parameters and innovative architecture featuring 32 layers and 4096 hidden dimensions.

Implementation Details

The model implements a sophisticated architecture with 32 attention heads, employing non-parametric LayerNorm and RoPE positional embeddings. It utilizes the SwiGLU activation function and supports a context length of 2048 tokens.

Full attention mechanism without bias terms
Sequential block type architecture
Trained with AdamW optimizer (LR: 3.0E-4)
Batch size of 2160 instances (~4M tokens)

Core Capabilities

Strong performance on core NLP tasks (71.6% average on core tasks)
Competitive results on ARC, COPA, and PIQA benchmarks
Efficient text generation and completion
Support for both inference and fine-tuning

Frequently Asked Questions

Q: What makes this model unique?

OLMo-7B stands out for its complete transparency in training process, architecture, and evaluation metrics. It's designed specifically for research advancement with all code, checkpoints, and logs being openly available.

Q: What are the recommended use cases?

The model excels in language modeling tasks, research applications, and can be fine-tuned for specific downstream tasks. It's particularly suitable for academic research and applications requiring transparent, reproducible results.