Light-R1-14B-DS

Maintained By
qihoo360

Light-R1-14B-DS

PropertyValue
Authorqihoo360
Base ModelDeepSeek-R1-Distill-Qwen-14B
Release DateMarch 12, 2025
Model URLHugging Face

What is Light-R1-14B-DS?

Light-R1-14B-DS represents a breakthrough in mathematical reasoning capabilities, being the first open-source model to successfully implement Reinforcement Learning (RL) on long-COT finetuned models under light computational budget. The model achieves state-of-the-art performance for 14B parameter models, with impressive scores of 74.0 and 60.2 on AIME 24 & 25 respectively.

Implementation Details

Built upon DeepSeek-R1-Distill-Qwen-14B, this model underwent specialized long-COT RL Post-Training. The training process demonstrated the expected behavior of simultaneous increases in response length and reward scores, marking a significant advancement in RL implementation for mathematical reasoning.

  • Careful data decontamination process using exact matching and 32-gram matching
  • Specialized training focusing on maintaining data integrity
  • Successful implementation of RL on already long-COT finetuned models

Core Capabilities

  • State-of-the-art performance on AIME mathematics benchmarks
  • Strong performance on GPQA without specific training (61.7)
  • Enhanced long-form Chain of Thought reasoning
  • Efficient performance under light computational requirements

Frequently Asked Questions

Q: What makes this model unique?

The model represents the first successful attempt at applying RL to already long-COT finetuned models in a computationally efficient manner, achieving SOTA results that outperform many 32B models.

Q: What are the recommended use cases?

The model excels in mathematical reasoning tasks, particularly in complex problem-solving scenarios requiring detailed step-by-step solutions, making it ideal for educational applications and mathematical research.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.