WizardLM-Uncensored-Falcon-7B-GPTQ

Property	Value
Parameter Count	1.54B
Model Type	Text Generation
License	Apache 2.0
Quantization	4-bit GPTQ
Architecture	Falcon-based Transformer

What is WizardLM-Uncensored-Falcon-7B-GPTQ?

This model is a 4-bit quantized version of Eric Hartford's WizardLM-Uncensored-Falcon-7B, specifically optimized for GPU inference using AutoGPTQ technology. It represents a carefully balanced compromise between model size and performance, designed to run efficiently while maintaining high-quality text generation capabilities.

Implementation Details

The model utilizes GPTQ quantization with a groupsize of 64 to maintain inference quality, deliberately avoiding desc_act (act-order) to optimize inference speed. It requires AutoGPTQ 0.2.0 or later and supports multiple tensor types including I32, BF16, and FP16.

Optimized 4-bit quantization for reduced memory footprint
64-group size implementation for balanced quality
Supports CUDA toolkit 11.7/11.8 with pre-compiled wheels
Requires trust_remote_code for execution

Core Capabilities

Uncensored text generation without built-in alignment constraints
Efficient GPU inference with reduced memory requirements
Compatible with text-generation-webui interface
Supports custom prompt templating using WizardLM format

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its uncensored nature and efficient quantization, allowing for unrestricted text generation while maintaining reasonable hardware requirements. It's specifically designed to serve as a base for custom alignment through additional fine-tuning.

Q: What are the recommended use cases?

The model is best suited for research and development purposes where unrestricted text generation is needed. Users should note that they are responsible for implementing appropriate safeguards and content filtering as needed for their specific applications.