DeepSeek-R1-Distill-Qwen-32B-abliterated-GGUF

Maintained By
bartowski

DeepSeek-R1-Distill-Qwen-32B-abliterated-GGUF

PropertyValue
Original ModelDeepSeek-R1-Distill-Qwen-32B-abliterated
Quantization Frameworkllama.cpp (b4546)
Size Range9.03GB - 65.54GB
FormatGGUF

What is DeepSeek-R1-Distill-Qwen-32B-abliterated-GGUF?

This is a comprehensive collection of quantized versions of the DeepSeek-R1-Distill-Qwen-32B model, optimized for different hardware configurations and use cases. The quantizations range from extremely high quality (Q8_0) to very compressed but still usable versions (IQ2_XXS), enabling users to choose the best trade-off between model size and performance for their specific needs.

Implementation Details

The model uses imatrix quantization techniques and comes in multiple variants, each optimized for different scenarios. The quantization types include standard K-quants (Q2-Q8) and newer I-quants (IQ2-IQ4), with special versions that use Q8_0 for embedding and output weights to maintain higher quality in critical model components.

  • Prompt format: Uses specific tokens for system, user, and assistant interactions
  • Multiple quantization options ranging from 9GB to 65GB
  • Supports online repacking for ARM and AVX CPU inference
  • Special variants with Q8_0 embed/output weights for enhanced quality

Core Capabilities

  • Flexible deployment options for various hardware configurations
  • Optimized performance on different architectures (ARM, AVX, CUDA)
  • Quality-size trade-off options for different use cases
  • Support for both CPU and GPU inference

Frequently Asked Questions

Q: What makes this model unique?

The model offers an exceptionally wide range of quantization options, from full BF16 precision to highly compressed versions, making it adaptable to various hardware constraints while maintaining usability. The implementation of both K-quants and I-quants provides users with optimal choices for different inference backends.

Q: What are the recommended use cases?

For maximum quality, use Q6_K_L or Q6_K versions. For balanced performance, Q4_K_M is recommended as the default choice. For limited RAM scenarios, I-quants like IQ4_XS offer good quality with smaller size. GPU users should consider K-quants for Vulkan or I-quants for cuBLAS/rocBLAS.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.