OLMo-7B

Maintained By
allenai

OLMo-7B

PropertyValue
Parameter Count6.89B
Training Tokens2.5 Trillion
LicenseApache 2.0
PaperResearch Paper
AuthorsAllen Institute for AI (AI2)

What is OLMo-7B?

OLMo-7B is part of the Open Language Model (OLMo) series designed to advance the science of language models. Trained on the Dolma dataset, it represents a significant step in open-source AI development with its 6.89B parameters and innovative architecture featuring 32 layers and 4096 hidden dimensions.

Implementation Details

The model implements a sophisticated architecture with 32 attention heads, employing non-parametric LayerNorm and RoPE positional embeddings. It utilizes the SwiGLU activation function and supports a context length of 2048 tokens.

  • Full attention mechanism without bias terms
  • Sequential block type architecture
  • Trained with AdamW optimizer (LR: 3.0E-4)
  • Batch size of 2160 instances (~4M tokens)

Core Capabilities

  • Strong performance on core NLP tasks (71.6% average on core tasks)
  • Competitive results on ARC, COPA, and PIQA benchmarks
  • Efficient text generation and completion
  • Support for both inference and fine-tuning

Frequently Asked Questions

Q: What makes this model unique?

OLMo-7B stands out for its complete transparency in training process, architecture, and evaluation metrics. It's designed specifically for research advancement with all code, checkpoints, and logs being openly available.

Q: What are the recommended use cases?

The model excels in language modeling tasks, research applications, and can be fine-tuned for specific downstream tasks. It's particularly suitable for academic research and applications requiring transparent, reproducible results.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.