Llama-3.2-1B-Instruct-FP8-dynamic

Maintained By
neuralmagic

Llama-3.2-1B-Instruct-FP8-dynamic

PropertyValue
Model BaseMeta-Llama-3.2
Release Date9/25/2024
Licensellama3.2
DeveloperNeural Magic
Hugging FaceModel Repository

What is Llama-3.2-1B-Instruct-FP8-dynamic?

Llama-3.2-1B-Instruct-FP8-dynamic is an optimized version of the Llama-3.2-1B-Instruct model, featuring FP8 quantization for both weights and activations. This optimization significantly reduces the model's memory footprint while maintaining impressive performance, achieving 99.7% of the original model's accuracy across various benchmarks.

Implementation Details

The model employs sophisticated quantization techniques, converting the original 16-bit parameters to 8-bit representations. Key technical aspects include:

  • Symmetric per-channel quantization for weights
  • Dynamic per-token quantization for activations
  • 50% reduction in disk size and GPU memory requirements
  • Quantization applied only to linear operators within transformer blocks
  • Compatible with vLLM backend for efficient deployment

Core Capabilities

  • Matches original model performance with scores of 47.55% on MMLU (5-shot)
  • Achieves 57.25% on ARC Challenge (0-shot)
  • Maintains 45.94% accuracy on GSM-8K-cot (8-shot)
  • Effective on multiple benchmarks including Winogrande, Hellaswag, and TruthfulQA
  • Optimized for assistant-like chat applications

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its efficient FP8 quantization scheme that maintains high performance while significantly reducing resource requirements. It achieves this through advanced dynamic quantization techniques for activations and symmetric quantization for weights.

Q: What are the recommended use cases?

The model is specifically designed for commercial and research applications in English language processing, particularly suited for assistant-like chat interactions. It's ideal for deployments where resource efficiency is crucial without compromising on performance.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.