ibm-granite_granite-3.2-8b-instruct-GGUF

Maintained By
bartowski

IBM Granite 3.2 8B Instruct GGUF

PropertyValue
Base ModelIBM Granite 3.2 8B
Quantization Range2.84GB - 8.68GB
Original Sourcehuggingface.co/ibm-granite/granite-3.2-8b-instruct
FormatGGUF (llama.cpp compatible)

What is ibm-granite_granite-3.2-8b-instruct-GGUF?

This is a comprehensive collection of GGUF quantizations of IBM's Granite 3.2 8B instruction-tuned language model. The repository offers 23 different quantization variants, ranging from extremely high quality (Q8_0) to highly compressed (IQ2_M), enabling users to choose the optimal balance between model size and performance for their specific hardware constraints.

Implementation Details

The model uses a specific prompt format: system role, user role, and assistant role markers. All quantizations were created using llama.cpp's imatrix option, optimizing for different use cases and hardware configurations.

  • Multiple quantization types (Q8_0 through IQ2_M)
  • Special variants with Q8_0 embedding weights for enhanced quality
  • Online repacking support for ARM and AVX CPU inference
  • Compatible with LM Studio and other llama.cpp-based projects

Core Capabilities

  • Flexible deployment options from 2.84GB to 8.68GB file sizes
  • Optimized performance on various hardware architectures
  • Enhanced quality options with specialized embedding weight quantization
  • Support for both CPU and GPU inference

Frequently Asked Questions

Q: What makes this model unique?

The model offers an exceptionally wide range of quantization options, including both traditional K-quants and newer I-quants, making it highly adaptable to different hardware configurations and performance requirements. The implementation includes special variants with Q8_0 embedding weights for enhanced quality in critical model components.

Q: What are the recommended use cases?

For maximum quality, use Q6_K_L or Q8_0 variants. For balanced performance, Q4_K_M is recommended as the default choice. For limited RAM scenarios, the I-quants (IQ3_M, IQ4_XS) offer good performance-to-size ratios, especially on modern GPUs using cuBLAS or rocBLAS.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.