OLMo-2-1124-13B

Property	Value
Parameter Count	13.7B
Training Tokens	5 Trillion
Context Length	4096
License	Apache 2.0
Language	English

What is OLMo-2-1124-13B?

OLMo-2-1124-13B is a state-of-the-art language model developed by Allen AI as part of their Open Language Model (OLMo) initiative. This 13B parameter model represents a significant advancement in open-source AI, trained on an impressive 5 trillion tokens and designed to compete with leading models in the field.

Implementation Details

The model features a sophisticated architecture with 40 layers, 5120 hidden size, and 40 attention heads. It underwent a two-stage training process: initial pretraining on OLMo-Mix-1124 (1.2 epochs) followed by fine-tuning on Dolmino-Mix-1124 dataset. The final model is the result of merging multiple training runs, including three 100B token versions and one 300B token version.

Advanced model architecture with 40 transformer layers
Comprehensive training on diverse high-quality datasets
Supports various quantization options for optimal performance
4096 token context window

Core Capabilities

Competitive performance with leading models on English academic benchmarks
Strong results in tasks like ARC, MMLU, and TriviaQA
Achieves 68.3% average score across major benchmarks
Specialized performance in mathematical reasoning and natural language understanding

Frequently Asked Questions

Q: What makes this model unique?

OLMo-2-1124-13B stands out for its fully open-source nature combined with state-of-the-art performance. It's trained on a carefully curated dataset mix and offers complete transparency in its training process and model architecture.

Q: What are the recommended use cases?

The model excels in academic and research applications, particularly in tasks requiring deep language understanding, mathematical reasoning, and complex problem-solving. It's suitable for both research purposes and practical applications in natural language processing.

OLMo-2-1124-13B

OLMo-2-1124-13B

What is OLMo-2-1124-13B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models