WizardLM-33B-V1-0-Uncensored-SuperHOT-8K-GPTQ

WizardLM-33B-V1-0-Uncensored-SuperHOT-8K-GPTQ

TheBloke

A 33B parameter GPTQ-quantized LLM merging WizardLM Uncensored with SuperHOT 8K. Features 8K context window, 4-bit quantization, and enhanced inference accuracy.

PropertyValue
Model Size33B Parameters
Quantization4-bit GPTQ
Context Length8192 tokens
AuthorTheBloke
Model URLTheBloke/WizardLM-33B-V1-0-Uncensored-SuperHOT-8K-GPTQ

What is WizardLM-33B-V1-0-Uncensored-SuperHOT-8K-GPTQ?

This model represents a sophisticated merge between WizardLM 33B V1.0 Uncensored and SuperHOT 8K, quantized to 4-bit precision using GPTQ technology. It's designed to provide enhanced performance while maintaining efficiency through quantization, featuring an extended context window of up to 8K tokens.

Implementation Details

The model utilizes advanced quantization techniques without group size to minimize VRAM requirements, while implementing act-order optimization to maximize inference accuracy. It's compatible with multiple frameworks including AutoGPTQ, ExLlama, and CUDA versions of GPTQ-for-LLaMa.

  • 4-bit quantization with act-order optimization
  • 8K context length support through SuperHOT technology
  • Safetensors format for improved loading and security
  • Compatible with text-generation-webui and various Python implementations

Core Capabilities

  • Extended context processing up to 8192 tokens
  • Efficient memory usage through 4-bit quantization
  • High-accuracy inference despite compression
  • Flexible deployment options across multiple frameworks

Frequently Asked Questions

Q: What makes this model unique?

This model combines the capabilities of WizardLM Uncensored with SuperHOT's extended context window, while maintaining efficiency through GPTQ quantization. The unique combination offers both performance and practical utility for resource-constrained environments.

Q: What are the recommended use cases?

The model is ideal for applications requiring extended context processing while operating under memory constraints. It's particularly suitable for text generation, analysis, and processing of longer documents, though users should note that utilizing the full 8K context on a 33B model requires substantial VRAM (>24GB).

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026