OLMo-1B

Property	Value
Parameter Count	1.18B parameters
Training Tokens	3 Trillion
License	Apache 2.0
Paper	arxiv:2402.00838
Architecture	16 layers, 2048 hidden size, 16 attention heads

What is OLMo-1B?

OLMo-1B is part of the Open Language Model (OLMo) series developed by Allen Institute for AI, designed to enable transparent language model research. It's trained on the Dolma dataset and represents a significant step toward open science in AI development. This model features 1.18B parameters and was trained on 3 trillion tokens, making it a compact yet powerful language model.

Implementation Details

The model employs a Transformer architecture with specific optimizations including non-parametric LayerNorm, RoPE positional embeddings, and full attention mechanism. It uses the AdamW optimizer with a peak learning rate of 4.0E-4 and runs on F32 tensor types.

Context Length: 2048 tokens
Hidden Size: 2048
Attention Heads: 16
Number of Layers: 16

Core Capabilities

Strong performance on core NLP tasks with 62.42% average accuracy across standard benchmarks
Efficient text generation and completion tasks
Support for both inference and fine-tuning workflows
Integration with popular ML frameworks through HuggingFace compatibility

Frequently Asked Questions

Q: What makes this model unique?

OLMo-1B stands out for its complete transparency in training process, data sources, and evaluation metrics. It's designed specifically for research purposes with extensive documentation and intermediate checkpoints available.

Q: What are the recommended use cases?

The model is well-suited for research applications, text generation tasks, and as a foundation for fine-tuning on specific domains. It performs particularly well on tasks like COPA (79%) and PIQA (73.7%).

OLMo-1B

OLMo-1B

What is OLMo-1B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models