google_gemma-3-12b-it-GGUF

Maintained By
bartowski

Google Gemma 3B GGUF Quantized

PropertyValue
Original Modelgoogle/gemma-3-12b-it
Quantization FrameworkLLaMA.cpp (build b4877)
Size Range4.02GB - 23.54GB
Vision SupportYes (requires MMPROJ file)

What is google_gemma-3-12b-it-GGUF?

This is a comprehensive collection of quantized versions of Google's Gemma 3B instruction-tuned model, optimized for efficient deployment using LLaMA.cpp. The repository offers multiple quantization levels, from full BF16 precision (23.54GB) down to highly compressed IQ2_S format (4.02GB), allowing users to balance quality and resource requirements.

Implementation Details

The model utilizes imatrix quantization techniques and includes special vision capabilities through MMPROJ files. It supports various quantization methods including standard K-quants and newer I-quants, with specific optimizations for different hardware architectures including ARM and AVX systems.

  • Multiple quantization options from Q8_0 to IQ2_S
  • Vision support through dedicated MMPROJ files (854MB-1.69GB)
  • Online weight repacking for ARM and AVX optimization
  • Special embed/output weight handling in certain variants

Core Capabilities

  • Text generation with various quality-size tradeoffs
  • Vision processing capabilities
  • Optimized performance on different hardware architectures
  • Support for both CPU and GPU inference

Frequently Asked Questions

Q: What makes this model unique?

This implementation stands out for its comprehensive range of quantization options and built-in vision capabilities, along with specialized optimizations for different hardware architectures. It's particularly notable for including both traditional K-quants and newer I-quants, offering users flexibility in choosing between performance and quality.

Q: What are the recommended use cases?

For maximum quality, the Q6_K_L variant (9.90GB) is recommended. For balanced performance, Q4_K_M (7.30GB) is suggested as the default choice. For systems with limited RAM, the I-quant series (IQ4_XS, IQ3_M) offers good performance at smaller sizes. The model is suitable for both CPU and GPU deployment, with specific variants optimized for different hardware configurations.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.