DeepSeek-R1-Distill-Qwen-14B

Property	Value
Base Model	Qwen2.5-14B
License	MIT License
Context Length	32,768 tokens
Paper	ArXiv Link

What is DeepSeek-R1-Distill-Qwen-14B?

DeepSeek-R1-Distill-Qwen-14B is a distilled version of the larger DeepSeek-R1 model, specifically optimized for mathematical reasoning and code generation tasks. It represents a successful attempt to compress the capabilities of a much larger model (671B parameters) into a more manageable 14B parameter model while maintaining impressive performance.

Implementation Details

The model is built upon the Qwen2.5-14B architecture and has been fine-tuned using carefully curated samples generated by the original DeepSeek-R1 model. It achieves remarkable performance across various benchmarks, particularly in mathematical reasoning tasks where it scores 69.7% on AIME 2024 pass@1 and 93.9% on MATH-500 pass@1.

Temperature setting: 0.6 recommended
Top-p value: 0.95
Maximum generation length: 32,768 tokens
Supports commercial use and modifications

Core Capabilities

Strong mathematical reasoning abilities with step-by-step problem solving
Advanced code generation and comprehension
High performance on complex reasoning tasks
Efficient processing with reasonable computational requirements

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its ability to maintain much of the reasoning capabilities of the larger DeepSeek-R1 model while being significantly more compact and deployable. It shows particularly strong performance in mathematical reasoning and coding tasks, making it valuable for educational and development applications.

Q: What are the recommended use cases?

The model is particularly well-suited for mathematical problem solving, coding tasks, and complex reasoning scenarios. It's recommended for applications requiring step-by-step problem solving, code generation, and technical analysis, while being more resource-efficient than larger models.