Light-R1-32B-DS

qihoo360

A 32B parameter math-focused model achieving near-SOTA performance on AIME benchmarks, trained with only 3K data points from DeepSeek base

Property	Value
Base Model	DeepSeek-R1-Distill-Qwen-32B
Release Date	March 12, 2025
Paper	View Paper
AIME24 Score	78.1
AIME25 Score	65.9

What is Light-R1-32B-DS?

Light-R1-32B-DS is a state-of-the-art 32B parameter mathematical reasoning model that achieves impressive performance on challenging math benchmarks. Built upon DeepSeek-R1-Distill-Qwen-32B, this model demonstrates exceptional capabilities despite being trained on a remarkably small dataset of just 3,000 examples.

Implementation Details

The model maintains the architecture of its base while incorporating careful data decontamination practices. The training process involved thorough validation against benchmark contamination using exact matching (excluding digits) and 32-gram matching techniques.

Achieves 78.1% on AIME24 and 65.9% on AIME25 benchmarks
68.0% performance on GPQA
Implements robust data decontamination protocols
Built on DeepSeek-R1-Distill-Qwen-32B architecture

Core Capabilities

Advanced mathematical reasoning and problem-solving
Efficient performance with minimal training data
Strong performance on standardized math benchmarks
Maintains data integrity through careful decontamination

Frequently Asked Questions

Q: What makes this model unique?

The model achieves near-SOTA performance using only 3,000 training examples, demonstrating exceptional efficiency in learning from limited data while maintaining high performance on challenging mathematical benchmarks.

Q: What are the recommended use cases?

The model is particularly well-suited for mathematical reasoning tasks, especially those requiring advanced problem-solving capabilities similar to AIME-level mathematics.