DeepSeek-R1-Distill-Qwen-14B

Maintained By
deepseek-ai

DeepSeek-R1-Distill-Qwen-14B

PropertyValue
Base ModelQwen2.5-14B
LicenseMIT License
Context Length32,768 tokens
PaperArXiv Link

What is DeepSeek-R1-Distill-Qwen-14B?

DeepSeek-R1-Distill-Qwen-14B is a distilled version of the larger DeepSeek-R1 model, specifically optimized for mathematical reasoning and code generation tasks. It represents a successful attempt to compress the capabilities of a much larger model (671B parameters) into a more manageable 14B parameter model while maintaining impressive performance.

Implementation Details

The model is built upon the Qwen2.5-14B architecture and has been fine-tuned using carefully curated samples generated by the original DeepSeek-R1 model. It achieves remarkable performance across various benchmarks, particularly in mathematical reasoning tasks where it scores 69.7% on AIME 2024 pass@1 and 93.9% on MATH-500 pass@1.

  • Temperature setting: 0.6 recommended
  • Top-p value: 0.95
  • Maximum generation length: 32,768 tokens
  • Supports commercial use and modifications

Core Capabilities

  • Strong mathematical reasoning abilities with step-by-step problem solving
  • Advanced code generation and comprehension
  • High performance on complex reasoning tasks
  • Efficient processing with reasonable computational requirements

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its ability to maintain much of the reasoning capabilities of the larger DeepSeek-R1 model while being significantly more compact and deployable. It shows particularly strong performance in mathematical reasoning and coding tasks, making it valuable for educational and development applications.

Q: What are the recommended use cases?

The model is particularly well-suited for mathematical problem solving, coding tasks, and complex reasoning scenarios. It's recommended for applications requiring step-by-step problem solving, code generation, and technical analysis, while being more resource-efficient than larger models.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.