Meta-Llama-3-8B-Instruct-FP8

Meta-Llama-3-8B-Instruct-FP8

neuralmagic

8B parameter Llama-3 model optimized with FP8 quantization, achieving 99.28% accuracy recovery vs original while halving memory requirements

PropertyValue
Parameter Count8.03B
Model TypeInstruction-tuned Language Model
LicenseLlama3
QuantizationFP8
LanguageEnglish

What is Meta-Llama-3-8B-Instruct-FP8?

Meta-Llama-3-8B-Instruct-FP8 is a quantized version of the original Llama-3 8B model, optimized for efficient deployment while maintaining near-original performance. Through FP8 quantization, it reduces the model's memory footprint by approximately 50% while preserving 99.28% of the original model's accuracy.

Implementation Details

The model implements symmetric per-tensor quantization on the linear operators within transformer blocks, using AutoFP8 with calibration samples from UltraChat. It's specifically designed for deployment with vLLM >= 0.5.0 and achieves an impressive average score of 68.22 on the OpenLLM benchmark.

  • Weight and activation quantization using FP8 data type
  • 50% reduction in disk size and GPU memory requirements
  • Optimized for vLLM deployment
  • Calibrated using 512 sequences of UltraChat

Core Capabilities

  • Assistant-like chat functionality
  • Maintains high performance across various benchmarks (MMLU, ARC Challenge, GSM-8K)
  • Efficient inference with reduced resource requirements
  • English language processing and generation

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient FP8 quantization that significantly reduces resource requirements while maintaining 99.28% of the original model's performance. It's specifically optimized for production deployment with vLLM.

Q: What are the recommended use cases?

The model is best suited for commercial and research applications requiring English language processing, particularly in assistant-like chat scenarios where resource efficiency is important. It's designed to handle various tasks while consuming less memory than the original model.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026