WizardLM-Uncensored-Falcon-7B-GPTQ
Property | Value |
---|---|
Parameter Count | 1.54B |
Model Type | Text Generation |
License | Apache 2.0 |
Quantization | 4-bit GPTQ |
Architecture | Falcon-based Transformer |
What is WizardLM-Uncensored-Falcon-7B-GPTQ?
This model is a 4-bit quantized version of Eric Hartford's WizardLM-Uncensored-Falcon-7B, specifically optimized for GPU inference using AutoGPTQ technology. It represents a carefully balanced compromise between model size and performance, designed to run efficiently while maintaining high-quality text generation capabilities.
Implementation Details
The model utilizes GPTQ quantization with a groupsize of 64 to maintain inference quality, deliberately avoiding desc_act (act-order) to optimize inference speed. It requires AutoGPTQ 0.2.0 or later and supports multiple tensor types including I32, BF16, and FP16.
- Optimized 4-bit quantization for reduced memory footprint
- 64-group size implementation for balanced quality
- Supports CUDA toolkit 11.7/11.8 with pre-compiled wheels
- Requires trust_remote_code for execution
Core Capabilities
- Uncensored text generation without built-in alignment constraints
- Efficient GPU inference with reduced memory requirements
- Compatible with text-generation-webui interface
- Supports custom prompt templating using WizardLM format
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its uncensored nature and efficient quantization, allowing for unrestricted text generation while maintaining reasonable hardware requirements. It's specifically designed to serve as a base for custom alignment through additional fine-tuning.
Q: What are the recommended use cases?
The model is best suited for research and development purposes where unrestricted text generation is needed. Users should note that they are responsible for implementing appropriate safeguards and content filtering as needed for their specific applications.