Mistral-Small-24b-Sertraline-0304-GGUF
Property | Value |
---|---|
Base Model | Mistral-Small-24b-Sertraline |
Quantization Options | Q2-Q8 variants |
Size Range | 7.21GB - 25.05GB |
Model URL | huggingface.co/bartowski/allura-org_Mistral-Small-24b-Sertraline-0304-GGUF |
What is allura-org_Mistral-Small-24b-Sertraline-0304-GGUF?
This is a comprehensive collection of GGUF quantized versions of the Mistral-Small-24b-Sertraline model, offering various compression levels to accommodate different hardware capabilities and performance requirements. The quantizations were created using llama.cpp release b4792 with imatrix optimization.
Implementation Details
The model features multiple quantization variants, from high-quality Q8_0 (25.05GB) to highly compressed IQ2_XS (7.21GB). Each variant offers different trade-offs between model size, inference speed, and output quality. The implementation includes special attention to embed/output weights optimization in certain variants (Q3_K_XL, Q4_K_L) using Q8_0 quantization for these specific weights.
- Supports online repacking for ARM and AVX CPU inference
- Implements SOTA compression techniques in IQ variants
- Includes specialized quantizations for different hardware architectures
- Features a standardized prompt format with system prompt support
Core Capabilities
- Multiple quantization options for various hardware configurations
- Optimized performance on both CPU and GPU platforms
- Support for different inference backends (cuBLAS, rocBLAS, Metal)
- Flexible deployment options through llama.cpp-based projects
Frequently Asked Questions
Q: What makes this model unique?
The model provides an extensive range of quantization options, from extremely high quality (Q8_0) to highly compressed (IQ2_XS), allowing users to choose the optimal balance between model size and performance for their specific hardware constraints.
Q: What are the recommended use cases?
For maximum quality, use Q6_K_L or Q6_K variants. For balanced performance, Q4_K_M is recommended as the default choice. For limited RAM scenarios, the IQ3 and IQ2 variants offer surprisingly usable performance at smaller sizes.