Phi-3-mini-128k-instruct-onnx

Maintained By
microsoft

Phi-3-mini-128k-instruct-onnx

PropertyValue
DeveloperMicrosoft
LicenseMIT
PaperAWQ Paper
FormatONNX

What is Phi-3-mini-128k-instruct-onnx?

Phi-3-mini-128k-instruct-onnx is an optimized ONNX version of Microsoft's Phi-3 Mini model, specifically designed for efficient inference across various hardware platforms. This variant supports an impressive 128K context length and has been optimized using advanced quantization techniques for superior performance.

Implementation Details

The model comes in multiple optimized configurations: int4 DML for Windows GPUs, fp16 CUDA for NVIDIA GPUs, int4 CUDA for NVIDIA GPUs, and int4 variants for CPU and mobile deployment. It leverages Activation Aware Quantization (AWQ) for the int4 versions, preserving accuracy while significantly reducing model size.

  • Supports DirectML for cross-platform GPU acceleration
  • Offers up to 9x faster inference compared to PyTorch on CUDA
  • Implements AWQ quantization for efficient compression
  • Compatible with Windows, Linux, and Mac platforms

Core Capabilities

  • High-performance text generation with 128K context window
  • Optimized for both CPU and GPU inference
  • Support for multiple precision formats (int4, fp16)
  • Cross-platform compatibility via ONNX Runtime
  • Enhanced instruction following and safety measures

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its optimized ONNX implementation that delivers up to 5x faster performance in FP16 and 9x faster in INT4 compared to PyTorch, while maintaining high accuracy through careful quantization techniques.

Q: What are the recommended use cases?

The model is ideal for applications requiring efficient text generation across various hardware platforms, particularly where deployment efficiency and long context handling are crucial. It's especially suitable for Windows-based applications leveraging DirectML for GPU acceleration.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.