IBM Granite 3.2 8B Instruct GGUF

Property	Value
Base Model	IBM Granite 3.2 8B
Quantization Range	2.84GB - 8.68GB
Original Source	huggingface.co/ibm-granite/granite-3.2-8b-instruct
Format	GGUF (llama.cpp compatible)

What is ibm-granite_granite-3.2-8b-instruct-GGUF?

This is a comprehensive collection of GGUF quantizations of IBM's Granite 3.2 8B instruction-tuned language model. The repository offers 23 different quantization variants, ranging from extremely high quality (Q8_0) to highly compressed (IQ2_M), enabling users to choose the optimal balance between model size and performance for their specific hardware constraints.

Implementation Details

The model uses a specific prompt format: system role, user role, and assistant role markers. All quantizations were created using llama.cpp's imatrix option, optimizing for different use cases and hardware configurations.

Multiple quantization types (Q8_0 through IQ2_M)
Special variants with Q8_0 embedding weights for enhanced quality
Online repacking support for ARM and AVX CPU inference
Compatible with LM Studio and other llama.cpp-based projects

Core Capabilities

Flexible deployment options from 2.84GB to 8.68GB file sizes
Optimized performance on various hardware architectures
Enhanced quality options with specialized embedding weight quantization
Support for both CPU and GPU inference

Frequently Asked Questions

Q: What makes this model unique?

The model offers an exceptionally wide range of quantization options, including both traditional K-quants and newer I-quants, making it highly adaptable to different hardware configurations and performance requirements. The implementation includes special variants with Q8_0 embedding weights for enhanced quality in critical model components.

Q: What are the recommended use cases?

For maximum quality, use Q6_K_L or Q8_0 variants. For balanced performance, Q4_K_M is recommended as the default choice. For limited RAM scenarios, the I-quants (IQ3_M, IQ4_XS) offer good performance-to-size ratios, especially on modern GPUs using cuBLAS or rocBLAS.

ibm-granite_granite-3.2-8b-instruct-GGUF