phi-4-unsloth-bnb-4bit

Maintained By
unsloth

phi-4-unsloth-bnb-4bit

PropertyValue
Base ModelMicrosoft Phi-4 (14B)
LicenseMIT
Context Length16K tokens
QuantizationDynamic 4-bit
Original Training9.8T tokens on 1920 H100-80G GPUs

What is phi-4-unsloth-bnb-4bit?

This is an optimized version of Microsoft's Phi-4 language model, transformed using Unsloth's dynamic 4-bit quantization technology. The model maintains the powerful capabilities of the original 14B parameter Phi-4 while significantly reducing memory requirements and improving inference speed. It's been converted to use Llama's architecture for better compatibility and fine-tuning capabilities.

Implementation Details

The model employs a sophisticated dynamic 4-bit quantization approach that selectively preserves critical parameters while compressing others, resulting in minimal accuracy loss compared to standard 4-bit quantization methods. This implementation achieves a 70% reduction in memory usage while delivering 2x faster inference speeds.

  • Converted to Llama architecture for improved compatibility
  • Includes Unsloth's specific bugfixes for Phi-4
  • Supports efficient fine-tuning with reduced resource requirements
  • Optimized for both inference and training scenarios

Core Capabilities

  • Strong performance on MMLLU (84.8%), MATH (80.4%), and HumanEval (82.6%)
  • Excels in reasoning tasks and code generation
  • 16K token context window for handling longer inputs
  • Optimized for chat-format interactions
  • Supports efficient fine-tuning on custom datasets

Frequently Asked Questions

Q: What makes this model unique?

The model combines Microsoft's high-quality Phi-4 base model with Unsloth's innovative dynamic 4-bit quantization, offering exceptional performance while requiring significantly fewer computational resources. It's particularly notable for maintaining high accuracy while reducing memory usage by 70%.

Q: What are the recommended use cases?

The model is ideal for memory-constrained environments, latency-sensitive applications, and scenarios requiring strong reasoning capabilities. It's particularly well-suited for chat applications, code generation, and academic/scientific tasks where computational efficiency is crucial.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.