UncensoredLM-DeepSeek-R1-Distill-Qwen-14B-GGUF
Property | Value |
---|---|
Base Model | DeepSeek/Qwen 14B |
Quantization Types | Multiple (F32 to IQ2_XS) |
Model URL | HuggingFace/uncensoredai |
Format | GGUF (llama.cpp compatible) |
What is uncensoredai_UncensoredLM-DeepSeek-R1-Distill-Qwen-14B-GGUF?
This is a comprehensive collection of GGUF quantized versions of the UncensoredLM model, based on DeepSeek and Qwen architecture. The model provides various quantization options ranging from full F32 precision (56.88GB) down to highly compressed IQ2_XS (4.54GB), allowing users to balance between performance and resource requirements.
Implementation Details
The model utilizes llama.cpp's advanced quantization techniques, including both traditional K-quants and newer I-quants. It features special prompt formatting and supports various inference scenarios with different RAM requirements.
- Implements imatrix quantization with custom calibration dataset
- Offers specialized embed/output weight handling in certain quantizations
- Supports online weight repacking for ARM and AVX CPU inference
- Multiple quantization options for different use cases
Core Capabilities
- Flexible deployment options from high-end servers to resource-constrained environments
- Optimized performance on various hardware (CPU, GPU, Apple Silicon)
- Compatible with LM Studio and other llama.cpp-based projects
- Special handling for embeddings and output weights in XL/L variants
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its extensive range of quantization options and optimization for uncensored responses, while maintaining high performance through advanced quantization techniques like I-quants and K-quants.
Q: What are the recommended use cases?
For most users, the Q4_K_M (8.64GB) variant is recommended as a balanced option. High-end users should consider Q6_K_L (12.05GB) for better quality, while those with limited resources might opt for IQ4_XS (7.83GB) or lower variants.