INTELLECT-1-Instruct
Property | Value |
---|---|
Parameter Count | 10 Billion |
Training Tokens | 1 Trillion |
Context Length | 8192 |
License | Apache 2.0 |
Release Date | 29 Nov 2024 |
What is INTELLECT-1-Instruct?
INTELLECT-1-Instruct is a groundbreaking 10B parameter language model trained through a unique collaborative effort across 3 continents. Built on the innovative prime framework, it represents a significant advancement in distributed AI training, utilizing contributions from 30 independent community members.
Implementation Details
The model leverages advanced distributed training techniques, including the ElasticDeviceMesh for fault-tolerant communication and DiLoCo algorithms with 100 inner steps. Its training data composition includes 55% fineweb-edu, 20% Stack V1, and various other high-quality sources.
- 42 layers with 32 attention heads
- 4096 hidden size dimension
- 128256 vocabulary size
- Custom int8 all-reduce kernels for 400x reduced communication overhead
Core Capabilities
- Strong performance on MMLU (49.89%) and GSM8K (38.58%)
- Enhanced knowledge retrieval and grade school math capabilities
- Robust instruction following and reasoning abilities
- Extensive post-training optimization through 16 SFT trainings
Frequently Asked Questions
Q: What makes this model unique?
The model's distributed training architecture and collaborative development approach set it apart, along with its sophisticated post-training process including multiple SFT and DPO runs.
Q: What are the recommended use cases?
INTELLECT-1-Instruct excels in instruction following, mathematical reasoning, and knowledge-intensive tasks. It's particularly well-suited for educational applications and complex reasoning scenarios.