QwQ-32B-int4-AutoRound-gptq-sym

Maintained By
OPEA

QwQ-32B-int4-AutoRound-gptq-sym

PropertyValue
Base ModelQwen/QwQ-32B
QuantizationINT4 with group size 128
PaperarXiv:2309.05516
AuthorOPEA

What is QwQ-32B-int4-AutoRound-gptq-sym?

This is a highly optimized quantized version of the QwQ-32B language model, utilizing Intel's AutoRound algorithm for INT4 quantization with symmetric weights. The model maintains impressive performance while significantly reducing memory footprint and computational requirements, achieving 99.5% of the original BF16 model's capabilities across various benchmarks.

Implementation Details

The model implements group-size 128 quantization with symmetric weights, optimized through 50 iterations with a learning rate of 5e-3. It's compatible with multiple inference backends including CPU, HPU, and CUDA, making it versatile for different deployment scenarios.

  • Achieves 0.6564 average score across major benchmarks
  • Maintains near-original performance on key tasks like MMLU (0.7940) and ARC-Easy (0.8152)
  • Optimized for efficient inference while preserving model capabilities

Core Capabilities

  • Multi-task reasoning and problem-solving
  • Mathematical and logical analysis
  • Natural language understanding and generation
  • Cross-lingual capabilities (demonstrated in sample prompts)

Frequently Asked Questions

Q: What makes this model unique?

The model uniquely combines Intel's AutoRound quantization with symmetric weights, achieving exceptional compression while maintaining performance within 0.5% of the original model across benchmarks.

Q: What are the recommended use cases?

The model is suitable for deployments requiring efficient inference while maintaining high accuracy, particularly in scenarios where memory constraints are important but performance cannot be significantly compromised.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.