huihui-ai_DeepSeek-R1-Distill-Llama-70B-abliterated-GGUF

Maintained By
bartowski

DeepSeek-R1-Distill-Llama-70B GGUF Quantizations

PropertyValue
Original ModelDeepSeek-R1-Distill-Llama-70B-abliterated
Quantization Types23 variants (Q8_0 to IQ1_M)
Size Range16.75GB - 74.98GB
Authorbartowski

What is huihui-ai_DeepSeek-R1-Distill-Llama-70B-abliterated-GGUF?

This is a comprehensive collection of GGUF quantized versions of the DeepSeek-R1-Distill-Llama-70B model, created using llama.cpp's imatrix quantization technology. The collection offers various compression levels to accommodate different hardware capabilities and use-case requirements, from extremely high-quality Q8_0 quantization to highly compressed IQ1_M versions.

Implementation Details

The model uses a specific prompt format: <|begin▁of▁sentence|>{system_prompt}<|User|>{prompt}<|Assistant|><|end▁of▁sentence|><|Assistant|>. The quantizations were performed using llama.cpp release b4585, implementing state-of-the-art compression techniques.

  • Q8_0 (74.98GB): Highest quality quantization, suitable for users requiring maximum accuracy
  • Q6_K (57.89GB): Very high quality, near-perfect performance recommended for production use
  • Q4_K_M (42.52GB): Balanced quality-size ratio, recommended for most use cases
  • IQ4_XS (37.90GB): Innovative quantization offering good performance at smaller sizes
  • IQ2_XXS (19.10GB): Ultra-compressed version using SOTA techniques while maintaining usability

Core Capabilities

  • Multiple quantization options for different hardware configurations
  • Support for ARM and AVX CPU inference with online repacking
  • Compatibility with LM Studio and other llama.cpp-based projects
  • Split file support for larger quantizations

Frequently Asked Questions

Q: What makes this model unique?

This model collection stands out for its comprehensive range of quantization options, utilizing both traditional K-quants and newer I-quants, allowing users to find the perfect balance between model size, quality, and hardware compatibility.

Q: What are the recommended use cases?

For most users, the Q4_K_M variant is recommended as it provides a good balance of quality and size. Users with limited RAM should consider IQ3/IQ2 variants, while those requiring maximum quality should opt for Q6_K or Q8_0 variants.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.