Meta-Llama-3-8B-Instruct-quantized.w8a16
Property | Value |
---|---|
Base Model | Meta-Llama-3-8B-Instruct |
Quantization | INT8 weights |
Release Date | July 2, 2024 |
License | Llama3 |
Developer | Neural Magic |
What is Meta-Llama-3-8B-Instruct-quantized.w8a16?
This is a quantized version of Meta's Llama-3-8B-Instruct model, optimized for efficient deployment while maintaining nearly identical performance to the original model. The quantization process reduces the model's memory footprint by approximately 50% by converting weights from 16-bit to 8-bit integers, making it more resource-efficient without sacrificing significant accuracy.
Implementation Details
The model employs symmetric per-channel quantization specifically targeting the linear operators within transformer blocks. It was quantized using AutoGPTQ with a 1% damping factor and trained on 256 sequences of 8,192 random tokens. The model achieves an impressive average score of 68.69 on the OpenLLM benchmark (version 1), slightly outperforming the original model's score of 68.54.
- Weight quantization reduced from 16-bit to 8-bit (INT8)
- 50% reduction in disk size and GPU memory requirements
- Maintains performance across key benchmarks including MMLU, GSM-8K, and Hellaswag
- Compatible with both vLLM and transformers libraries
Core Capabilities
- Assistant-like chat functionality in English
- Commercial and research applications
- Efficient deployment with reduced resource requirements
- High performance on reasoning and knowledge-based tasks
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for achieving memory efficiency through INT8 quantization while maintaining or slightly improving performance compared to the original model. It's particularly notable for its practical deployment benefits without sacrificing capability.
Q: What are the recommended use cases?
The model is specifically designed for English language applications in commercial and research contexts, particularly for assistant-like chat scenarios. It's not recommended for use in other languages or in applications that violate applicable laws or regulations.