DeepSeek-R1-Distill-Qwen-32B
Property | Value |
---|---|
Base Model | Qwen2.5-32B |
License | MIT License |
Context Length | 32,768 tokens |
Paper | arXiv:2501.12948 |
What is DeepSeek-R1-Distill-Qwen-32B?
DeepSeek-R1-Distill-Qwen-32B is a powerful distilled language model that brings the advanced reasoning capabilities of the larger DeepSeek-R1 model into a more efficient 32B parameter architecture. Built on Qwen2.5-32B, this model demonstrates exceptional performance across various benchmarks, particularly in mathematical reasoning and coding tasks.
Implementation Details
The model is implemented as a distilled version of DeepSeek-R1, using 800k curated samples for training. It maintains strong performance while being more deployable than its larger parent model.
- Achieves 72.6% pass@1 on AIME 2024
- 94.3% accuracy on MATH-500
- 1691 rating on CodeForces
- Supports a context length of 32,768 tokens
Core Capabilities
- Advanced mathematical reasoning and problem-solving
- Strong coding performance across multiple languages
- Efficient processing with reduced parameter count
- Compatible with popular deployment frameworks like vLLM and SGLang
Frequently Asked Questions
Q: What makes this model unique?
This model uniquely combines the reasoning capabilities of larger models with the efficiency of a 32B parameter architecture, outperforming OpenAI-o1-mini across various benchmarks.
Q: What are the recommended use cases?
The model excels in mathematical problem-solving, coding tasks, and general reasoning applications. It's particularly effective when used with a temperature setting of 0.6 and explicit step-by-step reasoning prompts.