DeepSeek-R1-Distill-Qwen-7B

Property	Value
Base Model	Qwen2.5-Math-7B
License	MIT License
Context Length	32,768 tokens
Paper	arXiv:2501.12948

What is DeepSeek-R1-Distill-Qwen-7B?

DeepSeek-R1-Distill-Qwen-7B is a distilled version of the larger DeepSeek-R1 model, specifically designed to maintain strong reasoning capabilities while being more accessible with only 7B parameters. It's built upon the Qwen2.5-Math-7B architecture and has been fine-tuned using carefully curated samples from DeepSeek-R1.

Implementation Details

The model leverages advanced distillation techniques to transfer reasoning patterns from the larger 671B parameter DeepSeek-R1 model. It achieves impressive performance metrics, including 55.5% pass@1 on AIME 2024 and 92.8% on MATH-500 benchmarks.

Optimized for mathematical reasoning and coding tasks
Supports up to 32,768 token context length
Compatible with vLLM and SGLang deployment
Recommended temperature setting of 0.6

Core Capabilities

Strong performance in mathematical problem-solving
Advanced reasoning abilities inherited from DeepSeek-R1
Efficient coding task completion
Step-by-step reasoning capabilities

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines the efficiency of a 7B parameter architecture with sophisticated reasoning capabilities distilled from a much larger model (671B parameters), making it particularly effective for mathematical and coding tasks while remaining computationally accessible.

Q: What are the recommended use cases?

The model excels in mathematical problem-solving, coding tasks, and situations requiring step-by-step reasoning. It's particularly suitable for applications requiring both computational efficiency and strong reasoning capabilities.