DeepSeek-v2.5-1210-UD-gguf
Property | Value |
---|---|
Author | Enturbulate |
Model Type | Mixture of Experts (MoE) |
Available Variants | iq1_s, iq1_m, iq2_xxs, iq2_s, iq3_m |
Size Range | 49GB - 97GB |
Repository | Hugging Face |
What is DeepSeek-v2.5-1210-UD-gguf?
DeepSeek-v2.5-1210-UD-gguf is an optimized version of the DeepSeek v2.5 model that implements dynamic quantization techniques inspired by the successful unsloth dynamic quants of DeepSeek-R1. This implementation provides multiple compression variants while maintaining model performance through strategic layer compression.
Implementation Details
The model utilizes a straightforward quantization strategy that prevents attention and output layers from dropping below q4_k compression levels. The implementation leverages Unsloth's llama.cpp fork, with modifications specifically tailored for the v2.5 architecture.
- Multiple quantization variants ranging from 49GB to 97GB
- Modified llama-quant.cpp for optimized tensor compression
- Specialized handling of attention/output layers
- Architecture-specific adaptations for v2.5 model structure
Core Capabilities
- Efficient model compression while maintaining performance
- Multiple size variants for different hardware configurations
- Improved performance compared to standard llama.cpp low-bit quants
- Optimized memory usage through strategic layer compression
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its implementation of dynamic quantization techniques on the DeepSeek v2.5 architecture, offering various compression levels while maintaining better performance than standard low-bit quantization approaches.
Q: What are the recommended use cases?
The model is particularly useful for users who need to run DeepSeek v2.5 with limited computational resources, offering multiple size variants to match different hardware capabilities while maintaining model performance.