OLMo-1B

Maintained By
allenai

OLMo-1B

PropertyValue
Parameter Count1.18B parameters
Training Tokens3 Trillion
LicenseApache 2.0
Paperarxiv:2402.00838
Architecture16 layers, 2048 hidden size, 16 attention heads

What is OLMo-1B?

OLMo-1B is part of the Open Language Model (OLMo) series developed by Allen Institute for AI, designed to enable transparent language model research. It's trained on the Dolma dataset and represents a significant step toward open science in AI development. This model features 1.18B parameters and was trained on 3 trillion tokens, making it a compact yet powerful language model.

Implementation Details

The model employs a Transformer architecture with specific optimizations including non-parametric LayerNorm, RoPE positional embeddings, and full attention mechanism. It uses the AdamW optimizer with a peak learning rate of 4.0E-4 and runs on F32 tensor types.

  • Context Length: 2048 tokens
  • Hidden Size: 2048
  • Attention Heads: 16
  • Number of Layers: 16

Core Capabilities

  • Strong performance on core NLP tasks with 62.42% average accuracy across standard benchmarks
  • Efficient text generation and completion tasks
  • Support for both inference and fine-tuning workflows
  • Integration with popular ML frameworks through HuggingFace compatibility

Frequently Asked Questions

Q: What makes this model unique?

OLMo-1B stands out for its complete transparency in training process, data sources, and evaluation metrics. It's designed specifically for research purposes with extensive documentation and intermediate checkpoints available.

Q: What are the recommended use cases?

The model is well-suited for research applications, text generation tasks, and as a foundation for fine-tuning on specific domains. It performs particularly well on tasks like COPA (79%) and PIQA (73.7%).

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.