OLMo-2-1124-7B
Property | Value |
---|---|
Parameter Count | 7.3B |
Training Tokens | 4 Trillion |
Context Length | 4096 |
License | Apache 2.0 |
Architecture | 32 layers, 4096 hidden size, 32 attention heads |
What is OLMo-2-1124-7B?
OLMo-2-1124-7B is an advanced open language model developed by Allen AI that represents a significant improvement over its predecessor, featuring a 9-point increase in MMLU performance. This model is part of the OLMo (Open Language Model) series, specifically designed to advance the science of language models while maintaining full transparency and accessibility.
Implementation Details
The model employs a sophisticated two-stage training approach: an initial pretraining phase using the OLMo-Mix-1124 dataset covering 4 trillion tokens, followed by fine-tuning on the Dolmino-Mix-1124 dataset. The architecture leverages 32 transformer layers with 4096 hidden dimensions and 32 attention heads, optimized for both performance and efficiency.
- Comprehensive training on 4 trillion tokens with staged approach
- Advanced model merging through "model souping" technique
- Supports 8-bit quantization for efficient deployment
- Full integration with HuggingFace Transformers library
Core Capabilities
- Achieves 62.9% average score across major benchmarks
- Strong performance in reasoning tasks (79.8% on ARC)
- Robust mathematical capabilities (67.5% on GSM8k)
- Effective natural language understanding (83.8% on HSwag)
Frequently Asked Questions
Q: What makes this model unique?
OLMo-2-1124-7B stands out for its fully open nature, comprehensive documentation, and significant performance improvements over previous versions. It combines extensive pretraining with innovative fine-tuning approaches, making it particularly suitable for research and commercial applications.
Q: What are the recommended use cases?
The model excels in various language tasks including reasoning, question-answering, and mathematical problem-solving. It's particularly well-suited for research purposes, academic applications, and development of downstream applications requiring strong language understanding capabilities.