DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant

Property	Value
Author	NexaAIDev
Model Size	1.5B parameters (Quantized)
Model Type	Reasoning-focused Language Model
Hugging Face	Model Repository

What is DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant?

DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant is an innovative quantized version of the DeepSeek-R1 model that achieves remarkable efficiency without compromising performance. Using NexaQuant technology, it reduces the model size to 1/4 of the original while maintaining full accuracy, enabling efficient local deployment with significantly reduced resource requirements.

Implementation Details

The model demonstrates impressive performance metrics, achieving 66.40 tokens per second decoding speed with only 1228 MB peak RAM usage on an AMD Ryzen™ AI 9 HX 370 processor. This represents a substantial improvement over the unquantized version's 25.28 tokens per second and 3788 MB RAM usage.

4-bit quantization while preserving full model accuracy
Compatible with Nexa-SDK, Ollama, LM Studio, and Llama.cpp
Optimized for local deployment with minimal resource requirements
Maintains competitive performance on reasoning benchmarks

Core Capabilities

Complex problem-solving and reasoning tasks
Maintains original model accuracy on key benchmarks (MMLLU: 37.41, ARC Easy: 65.53)
Efficient local execution with reduced memory footprint
Supports step-by-step reasoning with specialized output formatting

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to maintain full accuracy while achieving a 75% size reduction through NexaQuant technology, making it ideal for local deployment without the typical accuracy trade-offs associated with quantization.

Q: What are the recommended use cases?

The model excels in complex reasoning tasks, making it particularly suitable for applications requiring detailed problem-solving, such as mathematical analysis, logical reasoning, and step-by-step solution generation. It's optimized for scenarios where local deployment and data privacy are priorities.