SuperNova-Medius-GGUF

Property	Value
Parameter Count	14.8B
License	Apache 2.0
Author	bartowski
Base Model	arcee-ai/SuperNova-Medius

What is SuperNova-Medius-GGUF?

SuperNova-Medius-GGUF is a comprehensive collection of GGUF quantized versions of the SuperNova-Medius language model, optimized using llama.cpp. It offers various quantization levels to balance performance and resource requirements, ranging from the full F16 weights at 29.55GB to highly compressed versions as small as 4.31GB.

Implementation Details

The model uses an advanced quantization methodology with imatrix calibration, providing multiple variants optimized for different hardware configurations and use cases. The implementation includes special considerations for embed/output weights in certain variants, potentially improving output quality.

Multiple quantization options from Q8_0 to IQ2_XXS
Specialized ARM-optimized versions (Q4_0_X_X series)
Custom calibration dataset for optimal performance
Support for various inference endpoints including LM Studio

Core Capabilities

Text generation with chat-style formatting
Flexible deployment options for different hardware configurations
Optimized performance on both CPU and GPU setups
Support for conversation-style interactions using specific prompt format

Frequently Asked Questions

Q: What makes this model unique?

The model's unique feature is its extensive range of quantization options, allowing users to choose the perfect balance between model size and performance for their specific hardware constraints. The implementation includes cutting-edge quantization techniques like I-quants and special handling of embedding weights.

Q: What are the recommended use cases?

For most users, the Q4_K_M variant (8.99GB) is recommended as a balanced option. Users with limited RAM should consider the IQ3/IQ2 series, while those prioritizing quality should opt for Q6_K_L or Q5_K_L variants. The model is particularly well-suited for conversational AI applications and general text generation tasks.