DeepSeek-R1-Distill-Qwen-7B-GGUF

Property	Value
Base Model	DeepSeek-R1-Distill-Qwen-7B
Parameter Count	7 Billion
Format	GGUF (Various Quantizations)
Original Source	deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Author	bartowski

What is DeepSeek-R1-Distill-Qwen-7B-GGUF?

This is a comprehensive collection of GGUF quantized versions of the DeepSeek-R1-Distill-Qwen-7B model, offering various compression levels from 2.78GB to 30.47GB. The model uses llama.cpp's imatrix quantization technology to provide different quality-size tradeoffs suitable for various hardware configurations.

Implementation Details

The model features a specific prompt format: <｜begin▁of▁sentence｜>{system_prompt}<｜User｜>{prompt}<｜Assistant｜>. It includes multiple quantization options, from full F32 weights (30.47GB) to highly compressed IQ2_M (2.78GB) variants.

Utilizes llama.cpp release b4514 for quantization
Implements imatrix quantization with specialized dataset
Offers special Q8_0 embedding weights for certain variants
Supports online repacking for ARM and AVX CPU inference

Core Capabilities

Multiple quantization options for different hardware requirements
Optimized performance on various architectures (ARM, AVX)
Enhanced tokens/watt efficiency on Apple silicon
Support for different inference engines including LM Studio

Frequently Asked Questions

Q: What makes this model unique?

The model offers an extensive range of quantization options with detailed performance characteristics, allowing users to choose the perfect balance between model size, quality, and hardware compatibility. The implementation includes modern features like online repacking and specialized embed/output weight handling.

Q: What are the recommended use cases?

For most users, the Q4_K_M (4.68GB) variant is recommended as the default choice, offering good quality and reasonable size. Users with limited RAM should consider Q3_K variants, while those seeking maximum quality should opt for Q6_K_L or Q8_0 variants if hardware permits.