DeepSeek-R1-Distill-Qwen-1.5B

Property	Value
Parameter Count	1.5B
Base Model	Qwen2.5-Math-1.5B
License	MIT License
Context Length	32,768 tokens

What is DeepSeek-R1-Distill-Qwen-1.5B?

DeepSeek-R1-Distill-Qwen-1.5B is a compact yet powerful language model that represents a distilled version of the larger DeepSeek-R1 model. It inherits advanced reasoning capabilities while maintaining efficiency through its smaller parameter count. Built on the Qwen2.5-Math architecture, this model has been specifically optimized for mathematical and logical reasoning tasks.

Implementation Details

The model is implemented using distillation techniques from the larger DeepSeek-R1 model, incorporating about 800k curated samples. It maintains impressive performance despite its relatively small size, demonstrating strong capabilities in mathematical reasoning and problem-solving tasks.

Achieves 28.9% pass@1 on AIME 2024
Scores 83.9% on MATH-500 benchmark
Supports a context length of 32,768 tokens
Compatible with vLLM and SGLang deployment

Core Capabilities

Mathematical reasoning and problem-solving
Step-by-step solution generation
Logical deduction and analysis
Efficient processing with smaller computational requirements

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines the reasoning capabilities of larger models in a compact form factor, making it accessible for deployment in resource-constrained environments while maintaining strong performance on mathematical and reasoning tasks.

Q: What are the recommended use cases?

The model is particularly well-suited for mathematical problem-solving, educational applications, and scenarios requiring logical reasoning. It's recommended to use a temperature setting of 0.5-0.7 and include specific directives for mathematical problems.