OLMo-2-1124-13B
Property | Value |
---|---|
Parameter Count | 13.7B |
Training Tokens | 5 Trillion |
Context Length | 4096 |
License | Apache 2.0 |
Language | English |
What is OLMo-2-1124-13B?
OLMo-2-1124-13B is a state-of-the-art language model developed by Allen AI as part of their Open Language Model (OLMo) initiative. This 13B parameter model represents a significant advancement in open-source AI, trained on an impressive 5 trillion tokens and designed to compete with leading models in the field.
Implementation Details
The model features a sophisticated architecture with 40 layers, 5120 hidden size, and 40 attention heads. It underwent a two-stage training process: initial pretraining on OLMo-Mix-1124 (1.2 epochs) followed by fine-tuning on Dolmino-Mix-1124 dataset. The final model is the result of merging multiple training runs, including three 100B token versions and one 300B token version.
- Advanced model architecture with 40 transformer layers
- Comprehensive training on diverse high-quality datasets
- Supports various quantization options for optimal performance
- 4096 token context window
Core Capabilities
- Competitive performance with leading models on English academic benchmarks
- Strong results in tasks like ARC, MMLU, and TriviaQA
- Achieves 68.3% average score across major benchmarks
- Specialized performance in mathematical reasoning and natural language understanding
Frequently Asked Questions
Q: What makes this model unique?
OLMo-2-1124-13B stands out for its fully open-source nature combined with state-of-the-art performance. It's trained on a carefully curated dataset mix and offers complete transparency in its training process and model architecture.
Q: What are the recommended use cases?
The model excels in academic and research applications, particularly in tasks requiring deep language understanding, mathematical reasoning, and complex problem-solving. It's suitable for both research purposes and practical applications in natural language processing.