WizardLM-Uncensored-Falcon-40B-GPTQ

Maintained By
TheBloke

WizardLM-Uncensored-Falcon-40B-GPTQ

PropertyValue
Parameter Count6.17B
LicenseApache 2.0
Quantization4-bit GPTQ
FormatSafeTensors

What is WizardLM-Uncensored-Falcon-40B-GPTQ?

This is a 4-bit quantized version of Eric Hartford's WizardLM Uncensored Falcon 40B model, optimized for efficient GPU inference. The model represents a unique approach to language model development, being trained without traditional alignment constraints to allow for customizable alignment through subsequent fine-tuning.

Implementation Details

The model utilizes AutoGPTQ quantization technology, requiring specific technical setup including AutoGPTQ v0.2.1 and PyTorch 2.0.0 with CUDA 11.7 or 11.8 support. It's designed without group_size to minimize VRAM usage and implements desc_act (act-order) for improved inference accuracy.

  • Supports multiple tensor types: I32, BF16, FP16
  • Requires trust_remote_code for Falcon architecture support
  • Implements WizardLM prompt format

Core Capabilities

  • Efficient GPU inference with 4-bit precision
  • Uncensored text generation without built-in alignment constraints
  • Compatible with text-generation-webui
  • Supports custom prompt templates

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its uncensored training approach, removing alignment and moralizing responses from the training data to allow for custom alignment implementation through RLHF LoRA or other methods.

Q: What are the recommended use cases?

The model is suited for research and development purposes where custom alignment is desired. Users should note that it comes without guardrails and requires responsible implementation of safety measures.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.