Light-R1-14B-DS
Property | Value |
---|---|
Author | qihoo360 |
Base Model | DeepSeek-R1-Distill-Qwen-14B |
Release Date | March 12, 2025 |
Model URL | Hugging Face |
What is Light-R1-14B-DS?
Light-R1-14B-DS represents a breakthrough in mathematical reasoning capabilities, being the first open-source model to successfully implement Reinforcement Learning (RL) on long-COT finetuned models under light computational budget. The model achieves state-of-the-art performance for 14B parameter models, with impressive scores of 74.0 and 60.2 on AIME 24 & 25 respectively.
Implementation Details
Built upon DeepSeek-R1-Distill-Qwen-14B, this model underwent specialized long-COT RL Post-Training. The training process demonstrated the expected behavior of simultaneous increases in response length and reward scores, marking a significant advancement in RL implementation for mathematical reasoning.
- Careful data decontamination process using exact matching and 32-gram matching
- Specialized training focusing on maintaining data integrity
- Successful implementation of RL on already long-COT finetuned models
Core Capabilities
- State-of-the-art performance on AIME mathematics benchmarks
- Strong performance on GPQA without specific training (61.7)
- Enhanced long-form Chain of Thought reasoning
- Efficient performance under light computational requirements
Frequently Asked Questions
Q: What makes this model unique?
The model represents the first successful attempt at applying RL to already long-COT finetuned models in a computationally efficient manner, achieving SOTA results that outperform many 32B models.
Q: What are the recommended use cases?
The model excels in mathematical reasoning tasks, particularly in complex problem-solving scenarios requiring detailed step-by-step solutions, making it ideal for educational applications and mathematical research.