Light-R1-14B-DS

Property	Value
Author	qihoo360
Base Model	DeepSeek-R1-Distill-Qwen-14B
Release Date	March 12, 2025
Model URL	Hugging Face

What is Light-R1-14B-DS?

Light-R1-14B-DS represents a breakthrough in mathematical reasoning capabilities, being the first open-source model to successfully implement Reinforcement Learning (RL) on long-COT finetuned models under light computational budget. The model achieves state-of-the-art performance for 14B parameter models, with impressive scores of 74.0 and 60.2 on AIME 24 & 25 respectively.

Implementation Details

Built upon DeepSeek-R1-Distill-Qwen-14B, this model underwent specialized long-COT RL Post-Training. The training process demonstrated the expected behavior of simultaneous increases in response length and reward scores, marking a significant advancement in RL implementation for mathematical reasoning.

Careful data decontamination process using exact matching and 32-gram matching
Specialized training focusing on maintaining data integrity
Successful implementation of RL on already long-COT finetuned models

Core Capabilities

State-of-the-art performance on AIME mathematics benchmarks
Strong performance on GPQA without specific training (61.7)
Enhanced long-form Chain of Thought reasoning
Efficient performance under light computational requirements

Frequently Asked Questions

Q: What makes this model unique?

The model represents the first successful attempt at applying RL to already long-COT finetuned models in a computationally efficient manner, achieving SOTA results that outperform many 32B models.

Q: What are the recommended use cases?

The model excels in mathematical reasoning tasks, particularly in complex problem-solving scenarios requiring detailed step-by-step solutions, making it ideal for educational applications and mathematical research.

Light-R1-14B-DS

Light-R1-14B-DS

What is Light-R1-14B-DS?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models