OLMo-1B-0724-hf

Property	Value
Parameter Count	1.28B
Training Tokens	3.05 Trillion
License	Apache 2.0
Paper	arxiv:2402.00838

What is OLMo-1B-0724-hf?

OLMo-1B-0724-hf is the latest iteration of Allen AI's Open Language Model series, representing a significant improvement over its predecessor. This 1.28B parameter model is trained on an enhanced version of the Dolma dataset, featuring better deduplication and quality filtering. It demonstrates impressive performance gains, including a 4.4 point increase in HellaSwag benchmarks.

Implementation Details

The model employs a sophisticated architecture with 16 layers, 2048 hidden size, and 16 attention heads, supporting a context length of 4096 tokens. It utilizes staged training with two distinct phases: initial training on Dolma 1.7 dataset followed by fine-tuning on a higher-quality subset.

Advanced staged training approach with cosine learning rate scheduling
Optimized with AdamW optimizer (learning rate: 4.0E-4)
Implements full attention mechanism with non-parametric LayerNorm
Supports efficient quantization for improved inference speed

Core Capabilities

Strong performance in multiple benchmarks (65.0 average score across standard tasks)
Excels in tasks like SCIQ (93.4%) and PIQA (74.9%)
Efficient text generation with support for various sampling parameters
Native integration with HuggingFace Transformers library

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its open science approach, complete transparency in training data and process, and significant improvements through staged training methodology. It achieves competitive performance despite its relatively small size compared to larger models.

Q: What are the recommended use cases?

The model is well-suited for general language modeling tasks, research applications, and fine-tuning for specific downstream tasks. It's particularly effective for tasks requiring strong language understanding and generation capabilities while maintaining computational efficiency.

OLMo-1B-0724-hf

OLMo-1B-0724-hf

What is OLMo-1B-0724-hf?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models