gemma-2-2b-it-abliterated-GGUF
Property | Value |
---|---|
Original Model | IlyaGusev/gemma-2-2b-it-abliterated |
Quantization Tool | llama.cpp release b3496 |
Size Range | 1.37GB - 10.46GB |
Author | bartowski |
What is gemma-2-2b-it-abliterated-GGUF?
This is a comprehensive collection of quantized versions of the Gemma 2B model, specifically optimized using imatrix calibration. The repository provides various compression levels to accommodate different hardware configurations and performance requirements, ranging from extremely high-quality 10.46GB versions to lightweight 1.37GB implementations.
Implementation Details
The model uses a specific prompt format: <bos><start_of_turn>user {prompt}<end_of_turn> <start_of_turn>model <end_of_turn>. It's implemented using llama.cpp's quantization technology, offering both K-quants and I-quants for different use cases.
- Multiple quantization options from Q8_0 to Q2_K_L
- Special versions with Q8_0 embed/output weights for enhanced quality
- Optimized for different hardware: CPU, GPU (CUDA/ROCm), and Apple Metal
Core Capabilities
- Flexible deployment options for various hardware configurations
- Optimized performance with imatrix calibration
- Multiple quality-size tradeoff options
- Compatible with LM Studio and other inference engines
Frequently Asked Questions
Q: What makes this model unique?
The model offers an extensive range of quantization options with imatrix calibration, providing flexibility for different hardware setups while maintaining quality. The special versions with Q8_0 embed/output weights offer enhanced performance for specific use cases.
Q: What are the recommended use cases?
For maximum speed, choose a quant with file size 1-2GB smaller than your GPU's VRAM. For maximum quality, select based on combined system RAM and GPU VRAM. K-quants are recommended for general use, while I-quants are better for sizes below Q4 when using cuBLAS or rocBLAS.