Wizard-Vicuna-13B-Uncensored-GGML

Property	Value
Author	TheBloke
Base Model	Wizard-Vicuna-13B
License	Other
Format	GGML (Various Quantizations)

What is Wizard-Vicuna-13B-Uncensored-GGML?

This is a GGML-formatted version of Eric Hartford's Wizard-Vicuna-13B-Uncensored model, specifically optimized for CPU and GPU inference using llama.cpp and compatible frameworks. The model offers multiple quantization options ranging from 2-bit to 8-bit, allowing users to balance between performance and resource usage.

Implementation Details

The model comes in various quantization formats, including both traditional llama.cpp methods (q4_0, q4_1, q5_0, q5_1, q8_0) and new k-quant methods (q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K). File sizes range from 5.43GB to 13.83GB, with RAM requirements between 7.93GB and 16.33GB.

Multiple quantization options for different use-cases
Compatible with llama.cpp and various UI implementations
Supports GPU layer offloading for optimized performance
Uncensored training approach for flexible deployment

Core Capabilities

CPU and GPU inference support
Flexible memory usage options through different quantizations
Integration with popular frameworks like text-generation-webui and KoboldCpp
Raw, uncensored outputs without built-in alignment constraints

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its uncensored training approach and variety of quantization options, making it highly flexible for different hardware configurations and use cases. The removal of alignment/moralizing responses during training allows for custom alignment implementation.

Q: What are the recommended use cases?

The model is suitable for research and development purposes where control over alignment is desired. Users should choose quantization based on their hardware: lighter options (q2_K, q3_K) for limited resources, and heavier options (q5_K, q6_K) for maximum quality when resources permit.