OLMo-1B
Property | Value |
---|---|
Parameter Count | 1.18B parameters |
Training Tokens | 3 Trillion |
License | Apache 2.0 |
Paper | arxiv:2402.00838 |
Architecture | 16 layers, 2048 hidden size, 16 attention heads |
What is OLMo-1B?
OLMo-1B is part of the Open Language Model (OLMo) series developed by Allen Institute for AI, designed to enable transparent language model research. It's trained on the Dolma dataset and represents a significant step toward open science in AI development. This model features 1.18B parameters and was trained on 3 trillion tokens, making it a compact yet powerful language model.
Implementation Details
The model employs a Transformer architecture with specific optimizations including non-parametric LayerNorm, RoPE positional embeddings, and full attention mechanism. It uses the AdamW optimizer with a peak learning rate of 4.0E-4 and runs on F32 tensor types.
- Context Length: 2048 tokens
- Hidden Size: 2048
- Attention Heads: 16
- Number of Layers: 16
Core Capabilities
- Strong performance on core NLP tasks with 62.42% average accuracy across standard benchmarks
- Efficient text generation and completion tasks
- Support for both inference and fine-tuning workflows
- Integration with popular ML frameworks through HuggingFace compatibility
Frequently Asked Questions
Q: What makes this model unique?
OLMo-1B stands out for its complete transparency in training process, data sources, and evaluation metrics. It's designed specifically for research purposes with extensive documentation and intermediate checkpoints available.
Q: What are the recommended use cases?
The model is well-suited for research applications, text generation tasks, and as a foundation for fine-tuning on specific domains. It performs particularly well on tasks like COPA (79%) and PIQA (73.7%).