SuperNova-Medius-GGUF

arcee-ai

A 14B parameter LLM built on Qwen2.5-14B-Instruct, featuring cross-architecture distillation from Llama-3.1-405B and Qwen2.5-72B models. Excels in instruction-following and reasoning.

Property	Value
Parameter Count	14.8B
License	Apache-2.0
Architecture	Qwen2.5-14B-Instruct
Author	arcee-ai

What is SuperNova-Medius-GGUF?

SuperNova-Medius-GGUF is an advanced language model that represents a significant achievement in cross-architecture knowledge distillation. Built on the Qwen2.5-14B-Instruct architecture, this model uniquely combines knowledge from both Qwen2.5-72B-Instruct and Llama-3.1-405B-Instruct models through a sophisticated distillation process.

Implementation Details

The model employs a multi-teacher distillation approach, utilizing both logit and hidden state distillation techniques. The implementation involves careful vocabulary alignment across different architectures using mergekit-tokensurgeon, followed by a specialized fine-tuning process using EvolKit.

Cross-architecture distillation from two teacher models
Sophisticated vocabulary alignment system
Custom instruction dataset training
Optimized for 14B parameter efficiency

Core Capabilities

Advanced instruction-following with 0.832 score on IFEval
Strong performance in complex reasoning (0.631 on BBH)
Excels in customer support and technical assistance
Content creation and generation capabilities
Resource-efficient deployment options

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its cross-architecture distillation approach, combining knowledge from both Llama and Qwen architectures while maintaining a relatively compact 14B parameter size. It achieves performance metrics that rival larger models while being more deployment-friendly.

Q: What are the recommended use cases?

SuperNova-Medius is particularly well-suited for customer support automation, technical content creation, and complex reasoning tasks. Its balanced performance makes it ideal for organizations seeking advanced AI capabilities without the resource requirements of larger models.