Meta-Llama-3-8B-Instruct-quantized.w8a16

Property	Value
Base Model	Meta-Llama-3-8B-Instruct
Quantization	INT8 weights
Release Date	July 2, 2024
License	Llama3
Developer	Neural Magic

What is Meta-Llama-3-8B-Instruct-quantized.w8a16?

This is a quantized version of Meta's Llama-3-8B-Instruct model, optimized for efficient deployment while maintaining nearly identical performance to the original model. The quantization process reduces the model's memory footprint by approximately 50% by converting weights from 16-bit to 8-bit integers, making it more resource-efficient without sacrificing significant accuracy.

Implementation Details

The model employs symmetric per-channel quantization specifically targeting the linear operators within transformer blocks. It was quantized using AutoGPTQ with a 1% damping factor and trained on 256 sequences of 8,192 random tokens. The model achieves an impressive average score of 68.69 on the OpenLLM benchmark (version 1), slightly outperforming the original model's score of 68.54.

Weight quantization reduced from 16-bit to 8-bit (INT8)
50% reduction in disk size and GPU memory requirements
Maintains performance across key benchmarks including MMLU, GSM-8K, and Hellaswag
Compatible with both vLLM and transformers libraries

Core Capabilities

Assistant-like chat functionality in English
Commercial and research applications
Efficient deployment with reduced resource requirements
High performance on reasoning and knowledge-based tasks

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for achieving memory efficiency through INT8 quantization while maintaining or slightly improving performance compared to the original model. It's particularly notable for its practical deployment benefits without sacrificing capability.

Q: What are the recommended use cases?

The model is specifically designed for English language applications in commercial and research contexts, particularly for assistant-like chat scenarios. It's not recommended for use in other languages or in applications that violate applicable laws or regulations.