Phi-4-mini-instruct-GGUF

Property	Value
Parameters	3.8B
Context Length	128K tokens
Vocabulary Size	200K tokens
License	MIT
Training Data	5T tokens
Languages	23 languages including English, Chinese, Arabic, etc.

What is Phi-4-mini-instruct-GGUF?

Phi-4-mini-instruct-GGUF is a lightweight but powerful language model that has been optimized by Unsloth with specific bug fixes and improvements. Built upon Microsoft's Phi-4 architecture, this model excels in reasoning tasks despite its relatively small size of 3.8B parameters. It's particularly notable for achieving performance comparable to much larger models, especially in mathematical reasoning and logic tasks.

Implementation Details

The model uses a dense decoder-only Transformer architecture with grouped-query attention and shared input/output embeddings. It incorporates Flash Attention for improved efficiency and supports both chat and function-calling formats. Training was conducted on 512 A100-80G GPUs over 21 days.

Optimized with Unsloth's Dynamic Quants for improved accuracy in 4-bit format
Supports extensive 128K token context length
Implements bug fixes for padding, EOS tokens, and chat templates
Compatible with vLLM and Transformers libraries

Core Capabilities

Strong performance in mathematical reasoning (88.6% on GSM8K)
Excels in multilingual tasks across 23 languages
Robust instruction-following abilities
Efficient memory usage with selective quantization
Support for both chat and function-calling formats

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for achieving near-larger-model performance with only 3.8B parameters, particularly in reasoning tasks. It combines efficient architecture choices with Unsloth's optimizations for improved inference speed and reduced memory usage.

Q: What are the recommended use cases?

The model is ideal for memory-constrained environments, latency-sensitive applications, and tasks requiring strong reasoning capabilities. It's particularly well-suited for mathematical problems, logical reasoning, and multilingual applications where efficiency is crucial.