DeepSeek-R1-Distill-Qwen-7B
Property | Value |
---|---|
Base Model | Qwen2.5-Math-7B |
License | MIT License |
Context Length | 32,768 tokens |
Paper | arXiv:2501.12948 |
What is DeepSeek-R1-Distill-Qwen-7B?
DeepSeek-R1-Distill-Qwen-7B is a distilled version of the larger DeepSeek-R1 model, specifically designed to maintain strong reasoning capabilities while being more accessible with only 7B parameters. It's built upon the Qwen2.5-Math-7B architecture and has been fine-tuned using carefully curated samples from DeepSeek-R1.
Implementation Details
The model leverages advanced distillation techniques to transfer reasoning patterns from the larger 671B parameter DeepSeek-R1 model. It achieves impressive performance metrics, including 55.5% pass@1 on AIME 2024 and 92.8% on MATH-500 benchmarks.
- Optimized for mathematical reasoning and coding tasks
- Supports up to 32,768 token context length
- Compatible with vLLM and SGLang deployment
- Recommended temperature setting of 0.6
Core Capabilities
- Strong performance in mathematical problem-solving
- Advanced reasoning abilities inherited from DeepSeek-R1
- Efficient coding task completion
- Step-by-step reasoning capabilities
Frequently Asked Questions
Q: What makes this model unique?
This model uniquely combines the efficiency of a 7B parameter architecture with sophisticated reasoning capabilities distilled from a much larger model (671B parameters), making it particularly effective for mathematical and coding tasks while remaining computationally accessible.
Q: What are the recommended use cases?
The model excels in mathematical problem-solving, coding tasks, and situations requiring step-by-step reasoning. It's particularly suitable for applications requiring both computational efficiency and strong reasoning capabilities.