WizardLM-13B-Uncensored-GGML
Property | Value |
---|---|
Author | TheBloke |
Base Model | WizardLM 13B Uncensored |
Format | GGML |
License | Other |
What is WizardLM-13B-Uncensored-GGML?
WizardLM-13B-Uncensored-GGML is a specialized conversion of Eric Hartford's WizardLM 13B Uncensored model into the GGML format, optimized for efficient CPU and GPU inference. This model stands out for offering multiple quantization options, ranging from 2-bit to 8-bit, allowing users to balance between model size, performance, and accuracy based on their specific needs.
Implementation Details
The model is available in various quantization formats, including both original llama.cpp methods (q4_0, q4_1, q5_0, q5_1, q8_0) and new k-quant methods (q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K). File sizes range from 5.43GB for the q2_K version to 13.83GB for the q8_0 version.
- Supports multiple inference frameworks including text-generation-webui, KoboldCpp, and llama-cpp-python
- Provides flexible RAM requirements from 7.93GB to 16.33GB depending on quantization
- Implements advanced k-quant methods for optimal performance-size balance
Core Capabilities
- Uncensored text generation without built-in alignment constraints
- Efficient CPU+GPU inference with various quantization options
- Context window of 2048 tokens
- Compatible with multiple popular inference frameworks
Frequently Asked Questions
Q: What makes this model unique?
This model offers unprecedented flexibility in deployment through multiple quantization options while maintaining the capabilities of WizardLM 13B Uncensored. It's specifically designed for efficient inference on consumer hardware.
Q: What are the recommended use cases?
The model is ideal for users requiring uncensored text generation capabilities with flexible deployment options, particularly those needing to balance between model size, performance, and accuracy. It's especially suitable for running on consumer-grade hardware with various resource constraints.