QwQ-32B-int4-AutoRound-gptq-sym

Property	Value
Base Model	Qwen/QwQ-32B
Quantization	INT4 with group size 128
Paper	arXiv:2309.05516
Author	OPEA

What is QwQ-32B-int4-AutoRound-gptq-sym?

This is a highly optimized quantized version of the QwQ-32B language model, utilizing Intel's AutoRound algorithm for INT4 quantization with symmetric weights. The model maintains impressive performance while significantly reducing memory footprint and computational requirements, achieving 99.5% of the original BF16 model's capabilities across various benchmarks.

Implementation Details

The model implements group-size 128 quantization with symmetric weights, optimized through 50 iterations with a learning rate of 5e-3. It's compatible with multiple inference backends including CPU, HPU, and CUDA, making it versatile for different deployment scenarios.

Achieves 0.6564 average score across major benchmarks
Maintains near-original performance on key tasks like MMLU (0.7940) and ARC-Easy (0.8152)
Optimized for efficient inference while preserving model capabilities

Core Capabilities

Multi-task reasoning and problem-solving
Mathematical and logical analysis
Natural language understanding and generation
Cross-lingual capabilities (demonstrated in sample prompts)

Frequently Asked Questions

Q: What makes this model unique?

The model uniquely combines Intel's AutoRound quantization with symmetric weights, achieving exceptional compression while maintaining performance within 0.5% of the original model across benchmarks.

Q: What are the recommended use cases?

The model is suitable for deployments requiring efficient inference while maintaining high accuracy, particularly in scenarios where memory constraints are important but performance cannot be significantly compromised.