DeepSeek-R1-Distill-Qwen-7B

Maintained By
deepseek-ai

DeepSeek-R1-Distill-Qwen-7B

PropertyValue
Base ModelQwen2.5-Math-7B
LicenseMIT License
Context Length32,768 tokens
PaperarXiv:2501.12948

What is DeepSeek-R1-Distill-Qwen-7B?

DeepSeek-R1-Distill-Qwen-7B is a distilled version of the larger DeepSeek-R1 model, specifically designed to maintain strong reasoning capabilities while being more accessible with only 7B parameters. It's built upon the Qwen2.5-Math-7B architecture and has been fine-tuned using carefully curated samples from DeepSeek-R1.

Implementation Details

The model leverages advanced distillation techniques to transfer reasoning patterns from the larger 671B parameter DeepSeek-R1 model. It achieves impressive performance metrics, including 55.5% pass@1 on AIME 2024 and 92.8% on MATH-500 benchmarks.

  • Optimized for mathematical reasoning and coding tasks
  • Supports up to 32,768 token context length
  • Compatible with vLLM and SGLang deployment
  • Recommended temperature setting of 0.6

Core Capabilities

  • Strong performance in mathematical problem-solving
  • Advanced reasoning abilities inherited from DeepSeek-R1
  • Efficient coding task completion
  • Step-by-step reasoning capabilities

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines the efficiency of a 7B parameter architecture with sophisticated reasoning capabilities distilled from a much larger model (671B parameters), making it particularly effective for mathematical and coding tasks while remaining computationally accessible.

Q: What are the recommended use cases?

The model excels in mathematical problem-solving, coding tasks, and situations requiring step-by-step reasoning. It's particularly suitable for applications requiring both computational efficiency and strong reasoning capabilities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.