Meta-Llama-3.2-3B-Instruct-ONNX-INT4

Property	Value
Base Model	Meta-Llama/Llama-3.2-3B-Instruct
License	NVIDIA Open Model License
Supported Languages	English, German, French, Italian, Portuguese, Hindi, Spanish, Thai
Hardware Requirements	NVIDIA Ampere and newer GPUs (6GB+ VRAM)

What is Meta-Llama-3.2-3B-Instruct-ONNX-INT4?

This is a quantized version of Meta's Llama-3.2-3B-Instruct model, optimized using NVIDIA's TensorRT Model Optimizer. The model has been converted to INT4 precision to reduce memory footprint while maintaining performance, making it ideal for efficient deployment on NVIDIA GPUs.

Implementation Details

The model utilizes an optimized transformer architecture and has been converted through a three-step process: from PyTorch bfloat16 to ONNX FP16, and finally to ONNX INT4 using AWQ quantization. It's specifically designed for Windows environments and uses the Onnxruntime-GenAI-DirectML backend for inference.

Architecture: Auto-regressive transformer-based language model
Quantization: INT4 precision using AWQ
MMLU Accuracy: 57.71% (5-shot)
Inference Backend: Onnxruntime-GenAI-DirectML

Core Capabilities

Multi-language support across 8 languages
Optimized for commercial and research applications
Efficient inference on NVIDIA Ampere and newer GPUs
Reduced memory footprint through INT4 quantization
Compatible with Windows operating systems

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its INT4 quantization, which significantly reduces memory requirements while maintaining good performance. It's specifically optimized for NVIDIA GPUs and provides a practical solution for deploying large language models in resource-constrained environments.

Q: What are the recommended use cases?

The model is suitable for both commercial and research applications requiring efficient natural language processing. Its multi-language support and optimized performance make it ideal for applications where memory efficiency and processing speed are crucial, particularly on Windows systems with NVIDIA GPUs.