Phi-3-medium-128k-instruct-GGUF
Property | Value |
---|---|
Parameter Count | 14B parameters |
License | MIT |
Base Model | microsoft/Phi-3-medium-128k-instruct |
Quantization Author | bartowski |
What is Phi-3-medium-128k-instruct-GGUF?
This is a comprehensive suite of quantized versions of Microsoft's Phi-3 medium model, optimized for different hardware configurations and use cases. The model features various compression levels using llama.cpp's quantization techniques, ranging from full F32 weights (55.84GB) down to highly compressed IQ2_XXS (3.72GB) versions.
Implementation Details
The model utilizes advanced quantization techniques including both traditional K-quants and newer I-quants, with specific optimizations for embed and output weights. It supports a 128k context window and follows a specific prompt format: "<|user|> {prompt}<|end|><|assistant|><|end|>"
- Multiple quantization options from Q8_0 to IQ2_XXS
- Special versions with Q8_0 embed/output weights for enhanced quality
- Supports multilingual input and code generation
- Optimized for various hardware configurations (CPU, GPU, Apple Metal)
Core Capabilities
- Text generation with extended context window
- Code generation and completion
- Multilingual support
- Conversational AI applications
Frequently Asked Questions
Q: What makes this model unique?
This model offers an exceptional range of quantization options, allowing users to balance between model size and performance based on their hardware constraints. The implementation of both K-quants and I-quants, along with special optimizations for embed/output weights, provides flexibility for different use cases.
Q: What are the recommended use cases?
For highest quality, use Q6_K_L or Q6_K versions. For balanced performance, Q4_K_M is recommended. For systems with limited RAM, the IQ3 and IQ2 versions provide surprisingly usable performance at smaller sizes. The model is suitable for text generation, code completion, and conversational AI applications.