Light-R1-32B
Property | Value |
---|---|
Base Model | Qwen2.5-32B-Instruct |
License | Apache 2.0 |
Training Cost | ~$1000 (6 hours on 12 x H800) |
AIME24 Score | 76.6 (64-run average) |
What is Light-R1-32B?
Light-R1-32B is a groundbreaking mathematical reasoning model that achieves state-of-the-art performance on challenging mathematics competitions like AIME. Built on Qwen2.5-32B-Instruct, it demonstrates superior performance through an innovative curriculum learning approach combining Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).
Implementation Details
The model employs a three-stage training process: curriculum SFT stage1 with 76k data points, SFT stage2 with 3k more difficult problems, and finally DPO training. The training data is carefully curated from various public math datasets and decontaminated against common benchmarks.
- Utilizes curriculum learning with progressive difficulty levels
- Implements forced thinking through special tokens (<think>)
- Leverages model merging for optimal performance
- Trained on decontaminated mathematical datasets
Core Capabilities
- 76.6% accuracy on AIME24 (averaged over 64 runs)
- 64.6% accuracy on AIME25
- 61.8% score on GPQA Diamond
- Strong mathematical reasoning and step-by-step problem solving
Frequently Asked Questions
Q: What makes this model unique?
Light-R1-32B achieves superior performance on mathematical reasoning tasks while being trained at a fraction of the cost (~$1000) compared to other models. It's also fully open-source with available training code and datasets.
Q: What are the recommended use cases?
The model excels in mathematical problem-solving, particularly in competition-level mathematics. It's specifically designed for scenarios requiring detailed mathematical reasoning and step-by-step solution generation.