WizardLM-33B-V1-0-Uncensored-SuperHOT-8K-GPTQ

Maintained By
TheBloke

WizardLM-33B-V1-0-Uncensored-SuperHOT-8K-GPTQ

PropertyValue
Model Size33B Parameters
Quantization4-bit GPTQ
Context Length8192 tokens
AuthorTheBloke
Model URLTheBloke/WizardLM-33B-V1-0-Uncensored-SuperHOT-8K-GPTQ

What is WizardLM-33B-V1-0-Uncensored-SuperHOT-8K-GPTQ?

This model represents a sophisticated merge between WizardLM 33B V1.0 Uncensored and SuperHOT 8K, quantized to 4-bit precision using GPTQ technology. It's designed to provide enhanced performance while maintaining efficiency through quantization, featuring an extended context window of up to 8K tokens.

Implementation Details

The model utilizes advanced quantization techniques without group size to minimize VRAM requirements, while implementing act-order optimization to maximize inference accuracy. It's compatible with multiple frameworks including AutoGPTQ, ExLlama, and CUDA versions of GPTQ-for-LLaMa.

  • 4-bit quantization with act-order optimization
  • 8K context length support through SuperHOT technology
  • Safetensors format for improved loading and security
  • Compatible with text-generation-webui and various Python implementations

Core Capabilities

  • Extended context processing up to 8192 tokens
  • Efficient memory usage through 4-bit quantization
  • High-accuracy inference despite compression
  • Flexible deployment options across multiple frameworks

Frequently Asked Questions

Q: What makes this model unique?

This model combines the capabilities of WizardLM Uncensored with SuperHOT's extended context window, while maintaining efficiency through GPTQ quantization. The unique combination offers both performance and practical utility for resource-constrained environments.

Q: What are the recommended use cases?

The model is ideal for applications requiring extended context processing while operating under memory constraints. It's particularly suitable for text generation, analysis, and processing of longer documents, though users should note that utilizing the full 8K context on a 33B model requires substantial VRAM (>24GB).

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.