Phi-3-mini-128k-instruct-onnx-tf

Property	Value
License	MIT
Paper	AWQ Quantization Paper
Developer	Microsoft
Context Length	128K tokens

What is Phi-3-mini-128k-instruct-onnx-tf?

This is an optimized ONNX implementation of Microsoft's Phi-3-mini model, specifically designed for high-performance inference across multiple hardware platforms. The model represents a significant advancement in lightweight AI models, supporting an impressive 128K token context length while maintaining high-quality reasoning capabilities.

Implementation Details

The model leverages ONNX Runtime optimization and supports multiple precision formats including int4 quantization using AWQ technology. It's available in several optimized configurations for different hardware targets including DirectML for Windows GPUs, CUDA for NVIDIA GPUs, and specialized versions for CPU and mobile deployment.

Supports multiple hardware platforms (AMD, Intel, NVIDIA GPUs)
Up to 9x faster than PyTorch for certain configurations
Includes both FP16 and INT4 precision options
Optimized for CPU, GPU, and mobile inference

Core Capabilities

128K token context length support
High-quality reasoning and instruction following
Cross-platform compatibility
Efficient memory usage through quantization
Direct integration with ONNX Runtime Generate API

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its combination of extensive context length (128K tokens), optimized performance across various hardware platforms, and significant speed improvements over PyTorch implementations. The integration of AWQ quantization for INT4 precision makes it particularly efficient for deployment.

Q: What are the recommended use cases?

The model is ideal for applications requiring efficient inference across different hardware platforms, particularly where context length and performance are crucial. It's especially suitable for Windows applications leveraging DirectML, NVIDIA GPU deployments, and resource-constrained environments like mobile devices.