DeepSeek-R1-Distill-Qwen-1.5B-GGUF
Property | Value |
---|---|
Base Model | Qwen2.5-Math-1.5B |
License | MIT License |
Format | GGUF |
Paper | arXiv:2501.12948 |
What is DeepSeek-R1-Distill-Qwen-1.5B-GGUF?
DeepSeek-R1-Distill-Qwen-1.5B-GGUF is a compressed and optimized version of the larger DeepSeek-R1 model, specifically designed for efficient local deployment. It represents a significant achievement in model distillation, preserving advanced reasoning capabilities while reducing the model size to just 1.5B parameters.
Implementation Details
The model is implemented in GGUF format, making it compatible with llama.cpp for local deployment. It features specialized tokens for chat interactions (<|User|> and <|Assistant|>) and supports both CPU and GPU acceleration.
- Supports efficient inference with llama.cpp
- Optimized for both CPU and GPU deployment
- Implements temperature control (recommended 0.6) for stable outputs
- Maximum generation length of 32,768 tokens
Core Capabilities
- Strong mathematical reasoning abilities (83.9% on MATH-500 benchmark)
- Step-by-step problem solving
- Efficient memory usage with GGUF format
- Support for both inference and fine-tuning
Frequently Asked Questions
Q: What makes this model unique?
This model represents a successful distillation of advanced reasoning capabilities from the larger DeepSeek-R1 model into a much smaller and more accessible 1.5B parameter version, while maintaining impressive performance on mathematical and reasoning tasks.
Q: What are the recommended use cases?
The model is particularly well-suited for mathematical reasoning, step-by-step problem solving, and general reasoning tasks. It's ideal for users who need a lightweight but capable model for local deployment with limited computational resources.