Phi-4-mini-instruct-GGUF
Property | Value |
---|---|
Parameters | 3.8B |
Context Length | 128K tokens |
Vocabulary Size | 200K tokens |
License | MIT |
Training Data | 5T tokens |
Languages | 23 languages including English, Chinese, Arabic, etc. |
What is Phi-4-mini-instruct-GGUF?
Phi-4-mini-instruct-GGUF is a lightweight but powerful language model that has been optimized by Unsloth with specific bug fixes and improvements. Built upon Microsoft's Phi-4 architecture, this model excels in reasoning tasks despite its relatively small size of 3.8B parameters. It's particularly notable for achieving performance comparable to much larger models, especially in mathematical reasoning and logic tasks.
Implementation Details
The model uses a dense decoder-only Transformer architecture with grouped-query attention and shared input/output embeddings. It incorporates Flash Attention for improved efficiency and supports both chat and function-calling formats. Training was conducted on 512 A100-80G GPUs over 21 days.
- Optimized with Unsloth's Dynamic Quants for improved accuracy in 4-bit format
- Supports extensive 128K token context length
- Implements bug fixes for padding, EOS tokens, and chat templates
- Compatible with vLLM and Transformers libraries
Core Capabilities
- Strong performance in mathematical reasoning (88.6% on GSM8K)
- Excels in multilingual tasks across 23 languages
- Robust instruction-following abilities
- Efficient memory usage with selective quantization
- Support for both chat and function-calling formats
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for achieving near-larger-model performance with only 3.8B parameters, particularly in reasoning tasks. It combines efficient architecture choices with Unsloth's optimizations for improved inference speed and reduced memory usage.
Q: What are the recommended use cases?
The model is ideal for memory-constrained environments, latency-sensitive applications, and tasks requiring strong reasoning capabilities. It's particularly well-suited for mathematical problems, logical reasoning, and multilingual applications where efficiency is crucial.