dolphin-2.9.4-gemma2-2b-GGUF
Property | Value |
---|---|
Original Model | cognitivecomputations/dolphin-2.9.4-gemma2-2b |
Author | bartowski |
Size Range | 1.39GB - 5.24GB |
Format | GGUF (Various Quantizations) |
What is dolphin-2.9.4-gemma2-2b-GGUF?
This is a comprehensive collection of quantized versions of the Dolphin 2.9.4 Gemma 2B model, specifically optimized for different hardware configurations and use cases. The model uses llama.cpp's imatrix quantization technology to provide various compression levels while maintaining performance.
Implementation Details
The model comes in multiple quantization variants, each optimized for different scenarios:
- Full F16 weights (5.24GB) for maximum quality
- Q8_0 (2.78GB) for extremely high quality applications
- Q6_K variants (2.15-2.29GB) for very high quality with reasonable size
- Q5_K variants (1.88-2.07GB) for balanced performance
- Q4_K variants (1.64-1.85GB) for good quality with smaller size
- Q3_K and IQ3_M variants (1.39-1.69GB) for minimal resource usage
Core Capabilities
- Multiple quantization options for different hardware configurations
- Special variants with Q8_0 embed/output weights for enhanced quality
- Support for various inference backends (cuBLAS, rocBLAS, CPU)
- Optimized performance through imatrix calibration
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its comprehensive range of quantization options, allowing users to choose the perfect balance between model size and quality for their specific hardware setup. The implementation includes special variants with Q8_0 embed/output weights, potentially offering better quality in certain scenarios.
Q: What are the recommended use cases?
For users with limited VRAM (4-6GB), the Q4_K_M or IQ4_XS variants are recommended. For those prioritizing quality and having sufficient resources, the Q6_K_L variant offers near-perfect performance. The selection should be based on available RAM/VRAM and whether you're using CPU, NVIDIA (cuBLAS), or AMD (rocBLAS/Vulkan) hardware.