DeepSeek-R1-Distill-Qwen-1.5B
Property | Value |
---|---|
Parameter Count | 1.5B |
Base Model | Qwen2.5-Math-1.5B |
License | MIT License |
Context Length | 32,768 tokens |
What is DeepSeek-R1-Distill-Qwen-1.5B?
DeepSeek-R1-Distill-Qwen-1.5B is a compact yet powerful language model that represents a distilled version of the larger DeepSeek-R1 model. It inherits advanced reasoning capabilities while maintaining efficiency through its smaller parameter count. Built on the Qwen2.5-Math architecture, this model has been specifically optimized for mathematical and logical reasoning tasks.
Implementation Details
The model is implemented using distillation techniques from the larger DeepSeek-R1 model, incorporating about 800k curated samples. It maintains impressive performance despite its relatively small size, demonstrating strong capabilities in mathematical reasoning and problem-solving tasks.
- Achieves 28.9% pass@1 on AIME 2024
- Scores 83.9% on MATH-500 benchmark
- Supports a context length of 32,768 tokens
- Compatible with vLLM and SGLang deployment
Core Capabilities
- Mathematical reasoning and problem-solving
- Step-by-step solution generation
- Logical deduction and analysis
- Efficient processing with smaller computational requirements
Frequently Asked Questions
Q: What makes this model unique?
This model uniquely combines the reasoning capabilities of larger models in a compact form factor, making it accessible for deployment in resource-constrained environments while maintaining strong performance on mathematical and reasoning tasks.
Q: What are the recommended use cases?
The model is particularly well-suited for mathematical problem-solving, educational applications, and scenarios requiring logical reasoning. It's recommended to use a temperature setting of 0.5-0.7 and include specific directives for mathematical problems.