DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant

Maintained By
NexaAIDev

DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant

PropertyValue
AuthorNexaAIDev
Model Size1.5B parameters (Quantized)
Model TypeReasoning-focused Language Model
Hugging FaceModel Repository

What is DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant?

DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant is an innovative quantized version of the DeepSeek-R1 model that achieves remarkable efficiency without compromising performance. Using NexaQuant technology, it reduces the model size to 1/4 of the original while maintaining full accuracy, enabling efficient local deployment with significantly reduced resource requirements.

Implementation Details

The model demonstrates impressive performance metrics, achieving 66.40 tokens per second decoding speed with only 1228 MB peak RAM usage on an AMD Ryzen™ AI 9 HX 370 processor. This represents a substantial improvement over the unquantized version's 25.28 tokens per second and 3788 MB RAM usage.

  • 4-bit quantization while preserving full model accuracy
  • Compatible with Nexa-SDK, Ollama, LM Studio, and Llama.cpp
  • Optimized for local deployment with minimal resource requirements
  • Maintains competitive performance on reasoning benchmarks

Core Capabilities

  • Complex problem-solving and reasoning tasks
  • Maintains original model accuracy on key benchmarks (MMLLU: 37.41, ARC Easy: 65.53)
  • Efficient local execution with reduced memory footprint
  • Supports step-by-step reasoning with specialized output formatting

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to maintain full accuracy while achieving a 75% size reduction through NexaQuant technology, making it ideal for local deployment without the typical accuracy trade-offs associated with quantization.

Q: What are the recommended use cases?

The model excels in complex reasoning tasks, making it particularly suitable for applications requiring detailed problem-solving, such as mathematical analysis, logical reasoning, and step-by-step solution generation. It's optimized for scenarios where local deployment and data privacy are priorities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.