DeepSeek-R1-Distill-Qwen-32B

Maintained By
deepseek-ai

DeepSeek-R1-Distill-Qwen-32B

PropertyValue
Base ModelQwen2.5-32B
LicenseMIT License
Context Length32,768 tokens
PaperarXiv:2501.12948

What is DeepSeek-R1-Distill-Qwen-32B?

DeepSeek-R1-Distill-Qwen-32B is a powerful distilled language model that brings the advanced reasoning capabilities of the larger DeepSeek-R1 model into a more efficient 32B parameter architecture. Built on Qwen2.5-32B, this model demonstrates exceptional performance across various benchmarks, particularly in mathematical reasoning and coding tasks.

Implementation Details

The model is implemented as a distilled version of DeepSeek-R1, using 800k curated samples for training. It maintains strong performance while being more deployable than its larger parent model.

  • Achieves 72.6% pass@1 on AIME 2024
  • 94.3% accuracy on MATH-500
  • 1691 rating on CodeForces
  • Supports a context length of 32,768 tokens

Core Capabilities

  • Advanced mathematical reasoning and problem-solving
  • Strong coding performance across multiple languages
  • Efficient processing with reduced parameter count
  • Compatible with popular deployment frameworks like vLLM and SGLang

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines the reasoning capabilities of larger models with the efficiency of a 32B parameter architecture, outperforming OpenAI-o1-mini across various benchmarks.

Q: What are the recommended use cases?

The model excels in mathematical problem-solving, coding tasks, and general reasoning applications. It's particularly effective when used with a temperature setting of 0.6 and explicit step-by-step reasoning prompts.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.