Phi-3-medium-128k-instruct-GGUF

Property	Value
Parameter Count	14B parameters
License	MIT
Base Model	microsoft/Phi-3-medium-128k-instruct
Quantization Author	bartowski

What is Phi-3-medium-128k-instruct-GGUF?

This is a comprehensive suite of quantized versions of Microsoft's Phi-3 medium model, optimized for different hardware configurations and use cases. The model features various compression levels using llama.cpp's quantization techniques, ranging from full F32 weights (55.84GB) down to highly compressed IQ2_XXS (3.72GB) versions.

Implementation Details

The model utilizes advanced quantization techniques including both traditional K-quants and newer I-quants, with specific optimizations for embed and output weights. It supports a 128k context window and follows a specific prompt format: "<|user|> {prompt}<|end|><|assistant|><|end|>"

Multiple quantization options from Q8_0 to IQ2_XXS
Special versions with Q8_0 embed/output weights for enhanced quality
Supports multilingual input and code generation
Optimized for various hardware configurations (CPU, GPU, Apple Metal)

Core Capabilities

Text generation with extended context window
Code generation and completion
Multilingual support
Conversational AI applications

Frequently Asked Questions

Q: What makes this model unique?

This model offers an exceptional range of quantization options, allowing users to balance between model size and performance based on their hardware constraints. The implementation of both K-quants and I-quants, along with special optimizations for embed/output weights, provides flexibility for different use cases.

Q: What are the recommended use cases?

For highest quality, use Q6_K_L or Q6_K versions. For balanced performance, Q4_K_M is recommended. For systems with limited RAM, the IQ3 and IQ2 versions provide surprisingly usable performance at smaller sizes. The model is suitable for text generation, code completion, and conversational AI applications.