Llama-3.2-3B-Instruct-ONNX

Maintained By
onnx-community

Llama-3.2-3B-Instruct-ONNX

PropertyValue
LicenseMIT + Llama 3.2 Community License
Supported Languages8 (en, de, fr, it, pt, hi, es, th)
FrameworkONNX Runtime
DevelopmentONNX Runtime, Microsoft

What is Llama-3.2-3B-Instruct-ONNX?

Llama-3.2-3B-Instruct-ONNX is an optimized version of Meta's Llama-3.2-3B-Instruct model, converted to ONNX format for enhanced inference performance. This implementation offers significant speed improvements, achieving up to 39x faster performance than PyTorch compile on A100 GPUs and 1.25x faster than llama.cpp on standard CPU configurations.

Implementation Details

The model has been specifically optimized for both CPU and GPU deployment, featuring int4 quantization techniques via RTN. It supports multiple hardware configurations including A100 GPUs, standard CPU environments, and AMD processors.

  • Supports both CPU and GPU execution with int4 quantization
  • Optimized for DirectX 12-capable GPUs with 4GB+ RAM
  • CUDA support for NVIDIA GPUs with Compute Capability >= 7.0
  • Comprehensive multilingual support across 8 languages

Core Capabilities

  • High-performance text generation and instruction following
  • Efficient inference across various hardware platforms
  • Integrated with ONNX Runtime Generate() API for easy deployment
  • Supports both mobile and server-side deployment

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its optimized performance through ONNX Runtime integration, offering significant speed improvements over standard implementations while maintaining model quality through careful quantization.

Q: What are the recommended use cases?

The model is ideal for applications requiring efficient text generation and instruction following, particularly in production environments where performance optimization is crucial. It's especially suitable for deployment across various hardware configurations, from mobile devices to high-performance servers.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.