Mistral-Small-24b-Sertraline-0304-GGUF

Property	Value
Base Model	Mistral-Small-24b-Sertraline
Quantization Options	Q2-Q8 variants
Size Range	7.21GB - 25.05GB
Model URL	huggingface.co/bartowski/allura-org_Mistral-Small-24b-Sertraline-0304-GGUF

What is allura-org_Mistral-Small-24b-Sertraline-0304-GGUF?

This is a comprehensive collection of GGUF quantized versions of the Mistral-Small-24b-Sertraline model, offering various compression levels to accommodate different hardware capabilities and performance requirements. The quantizations were created using llama.cpp release b4792 with imatrix optimization.

Implementation Details

The model features multiple quantization variants, from high-quality Q8_0 (25.05GB) to highly compressed IQ2_XS (7.21GB). Each variant offers different trade-offs between model size, inference speed, and output quality. The implementation includes special attention to embed/output weights optimization in certain variants (Q3_K_XL, Q4_K_L) using Q8_0 quantization for these specific weights.

Supports online repacking for ARM and AVX CPU inference
Implements SOTA compression techniques in IQ variants
Includes specialized quantizations for different hardware architectures
Features a standardized prompt format with system prompt support

Core Capabilities

Multiple quantization options for various hardware configurations
Optimized performance on both CPU and GPU platforms
Support for different inference backends (cuBLAS, rocBLAS, Metal)
Flexible deployment options through llama.cpp-based projects

Frequently Asked Questions

Q: What makes this model unique?

The model provides an extensive range of quantization options, from extremely high quality (Q8_0) to highly compressed (IQ2_XS), allowing users to choose the optimal balance between model size and performance for their specific hardware constraints.

Q: What are the recommended use cases?

For maximum quality, use Q6_K_L or Q6_K variants. For balanced performance, Q4_K_M is recommended as the default choice. For limited RAM scenarios, the IQ3 and IQ2 variants offer surprisingly usable performance at smaller sizes.

allura-org_Mistral-Small-24b-Sertraline-0304-GGUF