DeepSeek-R1-Distill-Llama-8B-NexaQuant

DeepSeek-R1-Distill-Llama-8B-NexaQuant

NexaAIDev

Quantized version of DeepSeek-R1 that maintains full accuracy while reducing size by 75%. Achieves 17.2 tokens/sec with just 5GB RAM usage.

PropertyValue
Base ModelDeepSeek-R1-Distill-Llama-8B
Quantization4-bit NexaQuant
Model URLhttps://huggingface.co/NexaAIDev/DeepSeek-R1-Distill-Llama-8B-NexaQuant
DeveloperNexaAIDev

What is DeepSeek-R1-Distill-Llama-8B-NexaQuant?

DeepSeek-R1-Distill-Llama-8B-NexaQuant is a groundbreaking quantized version of the DeepSeek-R1 reasoning model that maintains full model accuracy while reducing the file size to one-fourth of the original. This implementation solves the traditional trade-off between model size and performance, achieving impressive speeds of 17.20 tokens per second while using only 5017 MB of RAM.

Implementation Details

The model utilizes NexaQuant's advanced 4-bit quantization technology, significantly outperforming standard Q4_K_M quantization methods. It's compatible with multiple platforms including Nexa-SDK, Ollama, LM Studio, and Llama.cpp, making it highly accessible for various deployment scenarios.

  • Maintains original model accuracy while reducing size by 75%
  • Achieves 17.20 tokens/second processing speed
  • Requires only 5017 MB peak RAM usage
  • Compatible with major deployment platforms

Core Capabilities

  • Complex reasoning tasks with maintained accuracy (MMLLU: 54.94 vs original 55.59)
  • Strong performance on general tasks (HellaSwag: 54.56, PIQP: 77.68)
  • Efficient local deployment with minimal resource requirements
  • Specialized in step-by-step reasoning problems

Frequently Asked Questions

Q: What makes this model unique?

The model's key distinction is its ability to maintain full accuracy of the original DeepSeek-R1 model while reducing size by 75% through NexaQuant's advanced quantization technology. This enables efficient local deployment without compromising performance.

Q: What are the recommended use cases?

The model is particularly well-suited for complex problem-solving tasks requiring detailed reasoning, especially in resource-constrained environments where maintaining high accuracy is crucial. It's ideal for local deployment scenarios requiring privacy and offline access.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026