OLMo-1B-hf

allenai

An open-source 1.18B parameter language model from Allen AI, trained on 3T tokens. Features strong performance for its size and Apache 2.0 license.

Property	Value
Parameter Count	1.18B
Training Tokens	3 Trillion
Context Length	2048
License	Apache 2.0
Paper	arXiv:2402.00838

What is OLMo-1B-hf?

OLMo-1B-hf is part of the Open Language Model (OLMo) series developed by Allen AI to advance language model science. This Hugging Face-compatible version features 16 layers, 2048 hidden size, and 16 attention heads, trained on the comprehensive Dolma dataset.

Implementation Details

The model utilizes a modern Transformer architecture with several optimizations:

Non-parametric LayerNorm and RoPE positional embeddings
Full attention mechanism with sequential block type
SwiGLU activation function
Training optimized with AdamW (lr=4.0E-4, weight decay=0.1)

Core Capabilities

Strong performance on core NLP tasks (62.42% average across standard benchmarks)
Competitive with larger models on some tasks
Efficient text generation with support for various sampling methods
Easy integration with Hugging Face Transformers library

Frequently Asked Questions

Q: What makes this model unique?

OLMo-1B-hf stands out for its complete transparency in training data, methodology, and evaluation metrics. It achieves impressive performance for its size class, particularly in tasks like COPA (79%) and PIQA (73.7%).

Q: What are the recommended use cases?

The model is well-suited for research purposes, text generation tasks, and as a foundation for fine-tuning on specific applications. It's particularly effective for tasks requiring strong reasoning capabilities within its 2048 token context window.