allura-org_Mistral-Small-24b-Sertraline-0304-GGUF

allura-org_Mistral-Small-24b-Sertraline-0304-GGUF

bartowski

24B parameter Mistral model with multiple GGUF quantization options (7-25GB), optimized for different RAM/VRAM constraints and performance needs. Features Q2-Q8 quantization variants.

PropertyValue
Base ModelMistral-Small-24b-Sertraline
Quantization OptionsQ2-Q8 variants
Size Range7.21GB - 25.05GB
Model URLhuggingface.co/bartowski/allura-org_Mistral-Small-24b-Sertraline-0304-GGUF

What is allura-org_Mistral-Small-24b-Sertraline-0304-GGUF?

This is a comprehensive collection of GGUF quantized versions of the Mistral-Small-24b-Sertraline model, offering various compression levels to accommodate different hardware capabilities and performance requirements. The quantizations were created using llama.cpp release b4792 with imatrix optimization.

Implementation Details

The model features multiple quantization variants, from high-quality Q8_0 (25.05GB) to highly compressed IQ2_XS (7.21GB). Each variant offers different trade-offs between model size, inference speed, and output quality. The implementation includes special attention to embed/output weights optimization in certain variants (Q3_K_XL, Q4_K_L) using Q8_0 quantization for these specific weights.

  • Supports online repacking for ARM and AVX CPU inference
  • Implements SOTA compression techniques in IQ variants
  • Includes specialized quantizations for different hardware architectures
  • Features a standardized prompt format with system prompt support

Core Capabilities

  • Multiple quantization options for various hardware configurations
  • Optimized performance on both CPU and GPU platforms
  • Support for different inference backends (cuBLAS, rocBLAS, Metal)
  • Flexible deployment options through llama.cpp-based projects

Frequently Asked Questions

Q: What makes this model unique?

The model provides an extensive range of quantization options, from extremely high quality (Q8_0) to highly compressed (IQ2_XS), allowing users to choose the optimal balance between model size and performance for their specific hardware constraints.

Q: What are the recommended use cases?

For maximum quality, use Q6_K_L or Q6_K variants. For balanced performance, Q4_K_M is recommended as the default choice. For limited RAM scenarios, the IQ3 and IQ2 variants offer surprisingly usable performance at smaller sizes.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026