dolphin-2.9.4-gemma2-2b-GGUF

Maintained By
bartowski

dolphin-2.9.4-gemma2-2b-GGUF

PropertyValue
Original Modelcognitivecomputations/dolphin-2.9.4-gemma2-2b
Authorbartowski
Size Range1.39GB - 5.24GB
FormatGGUF (Various Quantizations)

What is dolphin-2.9.4-gemma2-2b-GGUF?

This is a comprehensive collection of quantized versions of the Dolphin 2.9.4 Gemma 2B model, specifically optimized for different hardware configurations and use cases. The model uses llama.cpp's imatrix quantization technology to provide various compression levels while maintaining performance.

Implementation Details

The model comes in multiple quantization variants, each optimized for different scenarios:

  • Full F16 weights (5.24GB) for maximum quality
  • Q8_0 (2.78GB) for extremely high quality applications
  • Q6_K variants (2.15-2.29GB) for very high quality with reasonable size
  • Q5_K variants (1.88-2.07GB) for balanced performance
  • Q4_K variants (1.64-1.85GB) for good quality with smaller size
  • Q3_K and IQ3_M variants (1.39-1.69GB) for minimal resource usage

Core Capabilities

  • Multiple quantization options for different hardware configurations
  • Special variants with Q8_0 embed/output weights for enhanced quality
  • Support for various inference backends (cuBLAS, rocBLAS, CPU)
  • Optimized performance through imatrix calibration

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its comprehensive range of quantization options, allowing users to choose the perfect balance between model size and quality for their specific hardware setup. The implementation includes special variants with Q8_0 embed/output weights, potentially offering better quality in certain scenarios.

Q: What are the recommended use cases?

For users with limited VRAM (4-6GB), the Q4_K_M or IQ4_XS variants are recommended. For those prioritizing quality and having sufficient resources, the Q6_K_L variant offers near-perfect performance. The selection should be based on available RAM/VRAM and whether you're using CPU, NVIDIA (cuBLAS), or AMD (rocBLAS/Vulkan) hardware.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.