WizardLM-Uncensored-Falcon-40B-GPTQ
Property | Value |
---|---|
Parameter Count | 6.17B |
License | Apache 2.0 |
Quantization | 4-bit GPTQ |
Format | SafeTensors |
What is WizardLM-Uncensored-Falcon-40B-GPTQ?
This is a 4-bit quantized version of Eric Hartford's WizardLM Uncensored Falcon 40B model, optimized for efficient GPU inference. The model represents a unique approach to language model development, being trained without traditional alignment constraints to allow for customizable alignment through subsequent fine-tuning.
Implementation Details
The model utilizes AutoGPTQ quantization technology, requiring specific technical setup including AutoGPTQ v0.2.1 and PyTorch 2.0.0 with CUDA 11.7 or 11.8 support. It's designed without group_size to minimize VRAM usage and implements desc_act (act-order) for improved inference accuracy.
- Supports multiple tensor types: I32, BF16, FP16
- Requires trust_remote_code for Falcon architecture support
- Implements WizardLM prompt format
Core Capabilities
- Efficient GPU inference with 4-bit precision
- Uncensored text generation without built-in alignment constraints
- Compatible with text-generation-webui
- Supports custom prompt templates
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its uncensored training approach, removing alignment and moralizing responses from the training data to allow for custom alignment implementation through RLHF LoRA or other methods.
Q: What are the recommended use cases?
The model is suited for research and development purposes where custom alignment is desired. Users should note that it comes without guardrails and requires responsible implementation of safety measures.