INTELLECT-1-Instruct

Property	Value
Parameter Count	10 Billion
Training Tokens	1 Trillion
Context Length	8192
License	Apache 2.0
Release Date	29 Nov 2024

What is INTELLECT-1-Instruct?

INTELLECT-1-Instruct is a groundbreaking 10B parameter language model trained through a unique collaborative effort across 3 continents. Built on the innovative prime framework, it represents a significant advancement in distributed AI training, utilizing contributions from 30 independent community members.

Implementation Details

The model leverages advanced distributed training techniques, including the ElasticDeviceMesh for fault-tolerant communication and DiLoCo algorithms with 100 inner steps. Its training data composition includes 55% fineweb-edu, 20% Stack V1, and various other high-quality sources.

42 layers with 32 attention heads
4096 hidden size dimension
128256 vocabulary size
Custom int8 all-reduce kernels for 400x reduced communication overhead

Core Capabilities

Strong performance on MMLU (49.89%) and GSM8K (38.58%)
Enhanced knowledge retrieval and grade school math capabilities
Robust instruction following and reasoning abilities
Extensive post-training optimization through 16 SFT trainings

Frequently Asked Questions

Q: What makes this model unique?

The model's distributed training architecture and collaborative development approach set it apart, along with its sophisticated post-training process including multiple SFT and DPO runs.

Q: What are the recommended use cases?

INTELLECT-1-Instruct excels in instruction following, mathematical reasoning, and knowledge-intensive tasks. It's particularly well-suited for educational applications and complex reasoning scenarios.