DeepSeek-R1-Distill-Llama-70B GGUF Quantizations
Property | Value |
---|---|
Original Model | DeepSeek-R1-Distill-Llama-70B-abliterated |
Quantization Types | 23 variants (Q8_0 to IQ1_M) |
Size Range | 16.75GB - 74.98GB |
Author | bartowski |
What is huihui-ai_DeepSeek-R1-Distill-Llama-70B-abliterated-GGUF?
This is a comprehensive collection of GGUF quantized versions of the DeepSeek-R1-Distill-Llama-70B model, created using llama.cpp's imatrix quantization technology. The collection offers various compression levels to accommodate different hardware capabilities and use-case requirements, from extremely high-quality Q8_0 quantization to highly compressed IQ1_M versions.
Implementation Details
The model uses a specific prompt format: <|begin▁of▁sentence|>{system_prompt}<|User|>{prompt}<|Assistant|><|end▁of▁sentence|><|Assistant|>. The quantizations were performed using llama.cpp release b4585, implementing state-of-the-art compression techniques.
- Q8_0 (74.98GB): Highest quality quantization, suitable for users requiring maximum accuracy
- Q6_K (57.89GB): Very high quality, near-perfect performance recommended for production use
- Q4_K_M (42.52GB): Balanced quality-size ratio, recommended for most use cases
- IQ4_XS (37.90GB): Innovative quantization offering good performance at smaller sizes
- IQ2_XXS (19.10GB): Ultra-compressed version using SOTA techniques while maintaining usability
Core Capabilities
- Multiple quantization options for different hardware configurations
- Support for ARM and AVX CPU inference with online repacking
- Compatibility with LM Studio and other llama.cpp-based projects
- Split file support for larger quantizations
Frequently Asked Questions
Q: What makes this model unique?
This model collection stands out for its comprehensive range of quantization options, utilizing both traditional K-quants and newer I-quants, allowing users to find the perfect balance between model size, quality, and hardware compatibility.
Q: What are the recommended use cases?
For most users, the Q4_K_M variant is recommended as it provides a good balance of quality and size. Users with limited RAM should consider IQ3/IQ2 variants, while those requiring maximum quality should opt for Q6_K or Q8_0 variants.