DeepSeek-R1-Distill-Qwen-32B

Property	Value
Base Model	Qwen2.5-32B
License	MIT License
Context Length	32,768 tokens
Paper	arXiv:2501.12948

What is DeepSeek-R1-Distill-Qwen-32B?

DeepSeek-R1-Distill-Qwen-32B is a powerful distilled language model that brings the advanced reasoning capabilities of the larger DeepSeek-R1 model into a more efficient 32B parameter architecture. Built on Qwen2.5-32B, this model demonstrates exceptional performance across various benchmarks, particularly in mathematical reasoning and coding tasks.

Implementation Details

The model is implemented as a distilled version of DeepSeek-R1, using 800k curated samples for training. It maintains strong performance while being more deployable than its larger parent model.

Achieves 72.6% pass@1 on AIME 2024
94.3% accuracy on MATH-500
1691 rating on CodeForces
Supports a context length of 32,768 tokens

Core Capabilities

Advanced mathematical reasoning and problem-solving
Strong coding performance across multiple languages
Efficient processing with reduced parameter count
Compatible with popular deployment frameworks like vLLM and SGLang

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines the reasoning capabilities of larger models with the efficiency of a 32B parameter architecture, outperforming OpenAI-o1-mini across various benchmarks.

Q: What are the recommended use cases?

The model excels in mathematical problem-solving, coding tasks, and general reasoning applications. It's particularly effective when used with a temperature setting of 0.6 and explicit step-by-step reasoning prompts.