WizardLM-Uncensored-Falcon-40B-GPTQ

Property	Value
Parameter Count	6.17B
License	Apache 2.0
Quantization	4-bit GPTQ
Format	SafeTensors

What is WizardLM-Uncensored-Falcon-40B-GPTQ?

This is a 4-bit quantized version of Eric Hartford's WizardLM Uncensored Falcon 40B model, optimized for efficient GPU inference. The model represents a unique approach to language model development, being trained without traditional alignment constraints to allow for customizable alignment through subsequent fine-tuning.

Implementation Details

The model utilizes AutoGPTQ quantization technology, requiring specific technical setup including AutoGPTQ v0.2.1 and PyTorch 2.0.0 with CUDA 11.7 or 11.8 support. It's designed without group_size to minimize VRAM usage and implements desc_act (act-order) for improved inference accuracy.

Supports multiple tensor types: I32, BF16, FP16
Requires trust_remote_code for Falcon architecture support
Implements WizardLM prompt format

Core Capabilities

Efficient GPU inference with 4-bit precision
Uncensored text generation without built-in alignment constraints
Compatible with text-generation-webui
Supports custom prompt templates

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its uncensored training approach, removing alignment and moralizing responses from the training data to allow for custom alignment implementation through RLHF LoRA or other methods.

Q: What are the recommended use cases?

The model is suited for research and development purposes where custom alignment is desired. Users should note that it comes without guardrails and requires responsible implementation of safety measures.