QwQ-32B-int4-AutoRound-gptq-sym
Property | Value |
---|---|
Base Model | Qwen/QwQ-32B |
Quantization | INT4 with group size 128 |
Paper | arXiv:2309.05516 |
Author | OPEA |
What is QwQ-32B-int4-AutoRound-gptq-sym?
This is a highly optimized quantized version of the QwQ-32B language model, utilizing Intel's AutoRound algorithm for INT4 quantization with symmetric weights. The model maintains impressive performance while significantly reducing memory footprint and computational requirements, achieving 99.5% of the original BF16 model's capabilities across various benchmarks.
Implementation Details
The model implements group-size 128 quantization with symmetric weights, optimized through 50 iterations with a learning rate of 5e-3. It's compatible with multiple inference backends including CPU, HPU, and CUDA, making it versatile for different deployment scenarios.
- Achieves 0.6564 average score across major benchmarks
- Maintains near-original performance on key tasks like MMLU (0.7940) and ARC-Easy (0.8152)
- Optimized for efficient inference while preserving model capabilities
Core Capabilities
- Multi-task reasoning and problem-solving
- Mathematical and logical analysis
- Natural language understanding and generation
- Cross-lingual capabilities (demonstrated in sample prompts)
Frequently Asked Questions
Q: What makes this model unique?
The model uniquely combines Intel's AutoRound quantization with symmetric weights, achieving exceptional compression while maintaining performance within 0.5% of the original model across benchmarks.
Q: What are the recommended use cases?
The model is suitable for deployments requiring efficient inference while maintaining high accuracy, particularly in scenarios where memory constraints are important but performance cannot be significantly compromised.