phi-4-onnx

Maintained By
microsoft

Microsoft Phi-4 ONNX

PropertyValue
DeveloperMicrosoft
Model TypeONNX Optimized Language Model
LicenseMIT
Model URLhttps://huggingface.co/microsoft/phi-4-onnx

What is phi-4-onnx?

Phi-4-onnx is an optimized ONNX implementation of Microsoft's Phi-4 language model, specifically designed for efficient inference across various computing platforms. This version leverages ONNX Runtime to deliver enhanced performance while maintaining the base model's capabilities. The model implements int4 quantization techniques for both CPU and GPU deployments, making it particularly suitable for resource-constrained environments.

Implementation Details

The model offers multiple optimized configurations targeting different deployment scenarios. It utilizes RTN-based int4 quantization, providing efficient inference capabilities across CPU, GPU, and mobile platforms. The implementation includes specific optimizations for various execution environments including CPU, CUDA, and DirectML.

  • CPU optimization with int4 quantization and block-32 acceleration
  • GPU-specific implementation with RTN quantization
  • DirectML support for Windows platforms
  • Cross-platform compatibility (Windows, Linux, Mac)

Core Capabilities

  • Efficient inference across multiple hardware platforms
  • Reduced memory footprint through int4 quantization
  • High-performance execution through ONNX Runtime
  • Support for advanced reasoning tasks
  • Optimized for both server and edge deployment

Frequently Asked Questions

Q: What makes this model unique?

This model stands out through its optimized ONNX implementation of the Phi-4 architecture, offering significant performance benefits while maintaining the base model's capabilities. The int4 quantization makes it particularly suitable for deployment in resource-constrained environments.

Q: What are the recommended use cases?

The model is ideal for scenarios requiring efficient inference on various hardware platforms, particularly where resource optimization is crucial. It's suitable for both server-side applications and edge devices, supporting advanced reasoning tasks while maintaining performance efficiency.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.