gemma-2-2b-it-abliterated-GGUF

Property	Value
Original Model	IlyaGusev/gemma-2-2b-it-abliterated
Quantization Tool	llama.cpp release b3496
Size Range	1.37GB - 10.46GB
Author	bartowski

What is gemma-2-2b-it-abliterated-GGUF?

This is a comprehensive collection of quantized versions of the Gemma 2B model, specifically optimized using imatrix calibration. The repository provides various compression levels to accommodate different hardware configurations and performance requirements, ranging from extremely high-quality 10.46GB versions to lightweight 1.37GB implementations.

Implementation Details

The model uses a specific prompt format: <bos><start_of_turn>user {prompt}<end_of_turn> <start_of_turn>model <end_of_turn>. It's implemented using llama.cpp's quantization technology, offering both K-quants and I-quants for different use cases.

Multiple quantization options from Q8_0 to Q2_K_L
Special versions with Q8_0 embed/output weights for enhanced quality
Optimized for different hardware: CPU, GPU (CUDA/ROCm), and Apple Metal

Core Capabilities

Flexible deployment options for various hardware configurations
Optimized performance with imatrix calibration
Multiple quality-size tradeoff options
Compatible with LM Studio and other inference engines

Frequently Asked Questions

Q: What makes this model unique?

The model offers an extensive range of quantization options with imatrix calibration, providing flexibility for different hardware setups while maintaining quality. The special versions with Q8_0 embed/output weights offer enhanced performance for specific use cases.

Q: What are the recommended use cases?

For maximum speed, choose a quant with file size 1-2GB smaller than your GPU's VRAM. For maximum quality, select based on combined system RAM and GPU VRAM. K-quants are recommended for general use, while I-quants are better for sizes below Q4 when using cuBLAS or rocBLAS.