Wizard-Vicuna-13B-Uncensored-SuperHOT-8K-GGML
Property | Value |
---|---|
Base Model | Vicuna 13B |
Context Length | 8192 tokens |
Quantization Options | 2-bit to 8-bit GGML |
Model Type | Uncensored Language Model |
Author | TheBloke (GGML) / Eric Hartford (Original) |
Hub Link | https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-SuperHOT-8K-GGML |
What is Wizard-Vicuna-13B-Uncensored-SuperHOT-8K-GGML?
This is a GGML-quantized version of Eric Hartford's Wizard Vicuna 13B Uncensored model, enhanced with SuperHOT technology to extend the context length to 8K tokens. The model represents a specialized variant trained without alignment constraints, offering maximum flexibility for custom implementations. The SuperHOT enhancement, developed by kaiokendev, employs modified RoPE (Rotary Position Embedding) to achieve extended context handling capabilities.
Implementation Details
The model is available in multiple GGML quantization formats, ranging from 2-bit to 8-bit precision, offering different trade-offs between model size and performance. The implementation supports various optimization techniques including k-quant methods and tensor-specific quantization strategies.
- Supports context lengths up to 8192 tokens through SuperHOT RoPE modifications
- Multiple quantization options from 5.51GB (q2_K) to 10.68GB (q6_K)
- Compatible with KoboldCpp (v1.33+) for utilizing extended context
- Implements new k-quant methods for optimal performance/size ratio
Core Capabilities
- Extended context processing up to 8K tokens
- Efficient CPU inference through GGML quantization
- Flexible response generation without built-in alignment constraints
- GPU acceleration support through layer offloading
- Various precision options for different hardware configurations
Frequently Asked Questions
Q: What makes this model unique?
This model combines three key features: it's based on the high-quality Wizard-Vicuna architecture, removes alignment constraints for maximum flexibility, and implements SuperHOT technology for extended context handling up to 8K tokens.
Q: What are the recommended use cases?
The model is particularly suitable for applications requiring long-context processing, CPU-based inference, and scenarios where custom alignment implementation is desired. It's important to note that as an uncensored model, it requires careful consideration and responsible implementation of appropriate safeguards.