DeepSeek-R1-Distill-Llama-70B GGUF Quantizations

Property	Value
Original Model	DeepSeek-R1-Distill-Llama-70B-abliterated
Quantization Types	23 variants (Q8_0 to IQ1_M)
Size Range	16.75GB - 74.98GB
Author	bartowski

What is huihui-ai_DeepSeek-R1-Distill-Llama-70B-abliterated-GGUF?

This is a comprehensive collection of GGUF quantized versions of the DeepSeek-R1-Distill-Llama-70B model, created using llama.cpp's imatrix quantization technology. The collection offers various compression levels to accommodate different hardware capabilities and use-case requirements, from extremely high-quality Q8_0 quantization to highly compressed IQ1_M versions.

Implementation Details

The model uses a specific prompt format: <｜begin▁of▁sentence｜>{system_prompt}<｜User｜>{prompt}<｜Assistant｜><｜end▁of▁sentence｜><｜Assistant｜>. The quantizations were performed using llama.cpp release b4585, implementing state-of-the-art compression techniques.

Q8_0 (74.98GB): Highest quality quantization, suitable for users requiring maximum accuracy
Q6_K (57.89GB): Very high quality, near-perfect performance recommended for production use
Q4_K_M (42.52GB): Balanced quality-size ratio, recommended for most use cases
IQ4_XS (37.90GB): Innovative quantization offering good performance at smaller sizes
IQ2_XXS (19.10GB): Ultra-compressed version using SOTA techniques while maintaining usability

Core Capabilities

Multiple quantization options for different hardware configurations
Support for ARM and AVX CPU inference with online repacking
Compatibility with LM Studio and other llama.cpp-based projects
Split file support for larger quantizations

Frequently Asked Questions

Q: What makes this model unique?

This model collection stands out for its comprehensive range of quantization options, utilizing both traditional K-quants and newer I-quants, allowing users to find the perfect balance between model size, quality, and hardware compatibility.

Q: What are the recommended use cases?

For most users, the Q4_K_M variant is recommended as it provides a good balance of quality and size. Users with limited RAM should consider IQ3/IQ2 variants, while those requiring maximum quality should opt for Q6_K or Q8_0 variants.

huihui-ai_DeepSeek-R1-Distill-Llama-70B-abliterated-GGUF