DeepSeek-R1-Distill-Qwen-14B
Property | Value |
---|---|
Base Model | Qwen2.5-14B |
License | MIT License |
Context Length | 32,768 tokens |
Paper | ArXiv Link |
What is DeepSeek-R1-Distill-Qwen-14B?
DeepSeek-R1-Distill-Qwen-14B is a distilled version of the larger DeepSeek-R1 model, specifically optimized for mathematical reasoning and code generation tasks. It represents a successful attempt to compress the capabilities of a much larger model (671B parameters) into a more manageable 14B parameter model while maintaining impressive performance.
Implementation Details
The model is built upon the Qwen2.5-14B architecture and has been fine-tuned using carefully curated samples generated by the original DeepSeek-R1 model. It achieves remarkable performance across various benchmarks, particularly in mathematical reasoning tasks where it scores 69.7% on AIME 2024 pass@1 and 93.9% on MATH-500 pass@1.
- Temperature setting: 0.6 recommended
- Top-p value: 0.95
- Maximum generation length: 32,768 tokens
- Supports commercial use and modifications
Core Capabilities
- Strong mathematical reasoning abilities with step-by-step problem solving
- Advanced code generation and comprehension
- High performance on complex reasoning tasks
- Efficient processing with reasonable computational requirements
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its ability to maintain much of the reasoning capabilities of the larger DeepSeek-R1 model while being significantly more compact and deployable. It shows particularly strong performance in mathematical reasoning and coding tasks, making it valuable for educational and development applications.
Q: What are the recommended use cases?
The model is particularly well-suited for mathematical problem solving, coding tasks, and complex reasoning scenarios. It's recommended for applications requiring step-by-step problem solving, code generation, and technical analysis, while being more resource-efficient than larger models.