DeepSeek-R1-Distill-Qwen-7B-GGUF
Property | Value |
---|---|
Base Model | DeepSeek-R1-Distill-Qwen-7B |
Parameter Count | 7 Billion |
Format | GGUF (Various Quantizations) |
Original Source | deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
Author | bartowski |
What is DeepSeek-R1-Distill-Qwen-7B-GGUF?
This is a comprehensive collection of GGUF quantized versions of the DeepSeek-R1-Distill-Qwen-7B model, offering various compression levels from 2.78GB to 30.47GB. The model uses llama.cpp's imatrix quantization technology to provide different quality-size tradeoffs suitable for various hardware configurations.
Implementation Details
The model features a specific prompt format: <|begin▁of▁sentence|>{system_prompt}<|User|>{prompt}<|Assistant|>. It includes multiple quantization options, from full F32 weights (30.47GB) to highly compressed IQ2_M (2.78GB) variants.
- Utilizes llama.cpp release b4514 for quantization
- Implements imatrix quantization with specialized dataset
- Offers special Q8_0 embedding weights for certain variants
- Supports online repacking for ARM and AVX CPU inference
Core Capabilities
- Multiple quantization options for different hardware requirements
- Optimized performance on various architectures (ARM, AVX)
- Enhanced tokens/watt efficiency on Apple silicon
- Support for different inference engines including LM Studio
Frequently Asked Questions
Q: What makes this model unique?
The model offers an extensive range of quantization options with detailed performance characteristics, allowing users to choose the perfect balance between model size, quality, and hardware compatibility. The implementation includes modern features like online repacking and specialized embed/output weight handling.
Q: What are the recommended use cases?
For most users, the Q4_K_M (4.68GB) variant is recommended as the default choice, offering good quality and reasonable size. Users with limited RAM should consider Q3_K variants, while those seeking maximum quality should opt for Q6_K_L or Q8_0 variants if hardware permits.