Light-R1-32B

Property	Value
Base Model	Qwen2.5-32B-Instruct
License	Apache 2.0
Training Cost	~$1000 (6 hours on 12 x H800)
AIME24 Score	76.6 (64-run average)

What is Light-R1-32B?

Light-R1-32B is a groundbreaking mathematical reasoning model that achieves state-of-the-art performance on challenging mathematics competitions like AIME. Built on Qwen2.5-32B-Instruct, it demonstrates superior performance through an innovative curriculum learning approach combining Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).

Implementation Details

The model employs a three-stage training process: curriculum SFT stage1 with 76k data points, SFT stage2 with 3k more difficult problems, and finally DPO training. The training data is carefully curated from various public math datasets and decontaminated against common benchmarks.

Utilizes curriculum learning with progressive difficulty levels
Implements forced thinking through special tokens (<think>)
Leverages model merging for optimal performance
Trained on decontaminated mathematical datasets

Core Capabilities

76.6% accuracy on AIME24 (averaged over 64 runs)
64.6% accuracy on AIME25
61.8% score on GPQA Diamond
Strong mathematical reasoning and step-by-step problem solving

Frequently Asked Questions

Q: What makes this model unique?

Light-R1-32B achieves superior performance on mathematical reasoning tasks while being trained at a fraction of the cost (~$1000) compared to other models. It's also fully open-source with available training code and datasets.

Q: What are the recommended use cases?

The model excels in mathematical problem-solving, particularly in competition-level mathematics. It's specifically designed for scenarios requiring detailed mathematical reasoning and step-by-step solution generation.

Light-R1-32B

Light-R1-32B

What is Light-R1-32B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models