Wizard-Vicuna-13B-Uncensored-SuperHOT-8K-GGML

Property	Value
Base Model	Vicuna 13B
Context Length	8192 tokens
Quantization Options	2-bit to 8-bit GGML
Model Type	Uncensored Language Model
Author	TheBloke (GGML) / Eric Hartford (Original)
Hub Link	https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-SuperHOT-8K-GGML

What is Wizard-Vicuna-13B-Uncensored-SuperHOT-8K-GGML?

This is a GGML-quantized version of Eric Hartford's Wizard Vicuna 13B Uncensored model, enhanced with SuperHOT technology to extend the context length to 8K tokens. The model represents a specialized variant trained without alignment constraints, offering maximum flexibility for custom implementations. The SuperHOT enhancement, developed by kaiokendev, employs modified RoPE (Rotary Position Embedding) to achieve extended context handling capabilities.

Implementation Details

The model is available in multiple GGML quantization formats, ranging from 2-bit to 8-bit precision, offering different trade-offs between model size and performance. The implementation supports various optimization techniques including k-quant methods and tensor-specific quantization strategies.

Supports context lengths up to 8192 tokens through SuperHOT RoPE modifications
Multiple quantization options from 5.51GB (q2_K) to 10.68GB (q6_K)
Compatible with KoboldCpp (v1.33+) for utilizing extended context
Implements new k-quant methods for optimal performance/size ratio

Core Capabilities

Extended context processing up to 8K tokens
Efficient CPU inference through GGML quantization
Flexible response generation without built-in alignment constraints
GPU acceleration support through layer offloading
Various precision options for different hardware configurations

Frequently Asked Questions

Q: What makes this model unique?

This model combines three key features: it's based on the high-quality Wizard-Vicuna architecture, removes alignment constraints for maximum flexibility, and implements SuperHOT technology for extended context handling up to 8K tokens.

Q: What are the recommended use cases?

The model is particularly suitable for applications requiring long-context processing, CPU-based inference, and scenarios where custom alignment implementation is desired. It's important to note that as an uncensored model, it requires careful consideration and responsible implementation of appropriate safeguards.