DeepSeek-R1-int2-mixed-sym-inc

Maintained By
OPEA

DeepSeek-R1-int2-mixed-sym-inc

PropertyValue
AuthorOPEA
PaperOptimize weight rounding via signed gradient descent for the quantization of LLMs
Model TypeINT2 Quantized Language Model
ImplementationMixed Precision (2/4/16-bit)

What is DeepSeek-R1-int2-mixed-sym-inc?

DeepSeek-R1-int2-mixed-sym-inc is a highly optimized quantized version of the DeepSeek-R1 language model, utilizing a mixed-precision approach with INT2 quantization as its base. The model employs symmetric quantization with group size 64, while strategically using 4-bit and 16-bit precision for certain layers to maintain performance while significantly reducing model size.

Implementation Details

The model implements an innovative quantization strategy where most layers use 2-bit precision with some layers strategically falling back to 4-bit or 16-bit precision. This mixed-precision approach helps maintain model performance while achieving substantial compression. The model can be deployed on both CPU and CUDA, with CPU implementations showing potentially better accuracy due to overflow protection.

  • Utilizes INT2 quantization with group size 64
  • Implements symmetric quantization for weight representation
  • Strategic fallback to 4-bit and 16-bit precision for critical layers
  • Supports both CPU and CUDA deployment options

Core Capabilities

  • Maintains strong performance on MMLU (0.8302 vs 0.8514 for BF16)
  • Competitive accuracy on ARC challenge (0.6084 vs 0.6212)
  • Effective on complex reasoning tasks like Hellaswag and Winogrande
  • Significant model size reduction while preserving functionality

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its innovative mixed-precision quantization approach, achieving extreme compression to INT2 while strategically preserving higher precision where needed. It demonstrates that aggressive quantization can maintain strong performance when properly implemented.

Q: What are the recommended use cases?

The model is well-suited for deployment scenarios where model size is a critical constraint but performance cannot be significantly compromised. It's particularly effective for general language understanding tasks, showing strong performance on benchmarks like MMLU and ARC.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.