DeepSeek-R1-Distill-Qwen-32B-abliterated-GGUF

Property	Value
Original Model	DeepSeek-R1-Distill-Qwen-32B-abliterated
Quantization Framework	llama.cpp (b4546)
Size Range	9.03GB - 65.54GB
Format	GGUF

What is DeepSeek-R1-Distill-Qwen-32B-abliterated-GGUF?

This is a comprehensive collection of quantized versions of the DeepSeek-R1-Distill-Qwen-32B model, optimized for different hardware configurations and use cases. The quantizations range from extremely high quality (Q8_0) to very compressed but still usable versions (IQ2_XXS), enabling users to choose the best trade-off between model size and performance for their specific needs.

Implementation Details

The model uses imatrix quantization techniques and comes in multiple variants, each optimized for different scenarios. The quantization types include standard K-quants (Q2-Q8) and newer I-quants (IQ2-IQ4), with special versions that use Q8_0 for embedding and output weights to maintain higher quality in critical model components.

Prompt format: Uses specific tokens for system, user, and assistant interactions
Multiple quantization options ranging from 9GB to 65GB
Supports online repacking for ARM and AVX CPU inference
Special variants with Q8_0 embed/output weights for enhanced quality

Core Capabilities

Flexible deployment options for various hardware configurations
Optimized performance on different architectures (ARM, AVX, CUDA)
Quality-size trade-off options for different use cases
Support for both CPU and GPU inference

Frequently Asked Questions

Q: What makes this model unique?

The model offers an exceptionally wide range of quantization options, from full BF16 precision to highly compressed versions, making it adaptable to various hardware constraints while maintaining usability. The implementation of both K-quants and I-quants provides users with optimal choices for different inference backends.

Q: What are the recommended use cases?

For maximum quality, use Q6_K_L or Q6_K versions. For balanced performance, Q4_K_M is recommended as the default choice. For limited RAM scenarios, I-quants like IQ4_XS offer good quality with smaller size. GPU users should consider K-quants for Vulkan or I-quants for cuBLAS/rocBLAS.