Phi-3-mini-128k-instruct-onnx-tf
Property | Value |
---|---|
License | MIT |
Paper | AWQ Quantization Paper |
Developer | Microsoft |
Context Length | 128K tokens |
What is Phi-3-mini-128k-instruct-onnx-tf?
This is an optimized ONNX implementation of Microsoft's Phi-3-mini model, specifically designed for high-performance inference across multiple hardware platforms. The model represents a significant advancement in lightweight AI models, supporting an impressive 128K token context length while maintaining high-quality reasoning capabilities.
Implementation Details
The model leverages ONNX Runtime optimization and supports multiple precision formats including int4 quantization using AWQ technology. It's available in several optimized configurations for different hardware targets including DirectML for Windows GPUs, CUDA for NVIDIA GPUs, and specialized versions for CPU and mobile deployment.
- Supports multiple hardware platforms (AMD, Intel, NVIDIA GPUs)
- Up to 9x faster than PyTorch for certain configurations
- Includes both FP16 and INT4 precision options
- Optimized for CPU, GPU, and mobile inference
Core Capabilities
- 128K token context length support
- High-quality reasoning and instruction following
- Cross-platform compatibility
- Efficient memory usage through quantization
- Direct integration with ONNX Runtime Generate API
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its combination of extensive context length (128K tokens), optimized performance across various hardware platforms, and significant speed improvements over PyTorch implementations. The integration of AWQ quantization for INT4 precision makes it particularly efficient for deployment.
Q: What are the recommended use cases?
The model is ideal for applications requiring efficient inference across different hardware platforms, particularly where context length and performance are crucial. It's especially suitable for Windows applications leveraging DirectML, NVIDIA GPU deployments, and resource-constrained environments like mobile devices.