Llama2 70B Chat Uncensored GPTQ
Property | Value |
---|---|
Parameter Count | 70B |
License | LLaMA 2 |
Paper | QLoRA (arxiv:2305.14314) |
Base Model | LLaMA 2 70B |
What is llama2_70b_chat_uncensored-GPTQ?
This is a GPTQ-quantized version of the Llama2 70B Chat Uncensored model, specifically optimized for efficient deployment while maintaining performance. The model was fine-tuned using the uncensored Wizard-Vicuna conversation dataset, designed to provide direct and unfiltered responses while maintaining factual accuracy.
Implementation Details
The model offers multiple quantization options, including 3-bit and 4-bit versions with various group sizes, allowing users to balance between VRAM usage and model accuracy. The implementation supports different branches for specific deployment scenarios, from minimal VRAM requirements to maximum inference quality.
- Multiple GPTQ parameter permutations available (3-bit to 4-bit)
- Group size options ranging from None to 128g
- Compatible with AutoGPTQ, Transformers, and ExLlama (4-bit versions)
- Customizable inference parameters for temperature and sampling
Core Capabilities
- Straightforward, unfiltered responses to queries
- Efficient memory usage through quantization
- Support for context window of 4096 tokens
- Flexible deployment options for different hardware configurations
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its combination of large-scale capabilities (70B parameters) with uncensored training, while being optimized for practical deployment through GPTQ quantization. It provides direct, unfiltered responses without the excessive safety constraints of standard LLaMA 2 chat models.
Q: What are the recommended use cases?
The model is suitable for applications requiring direct and unfiltered language model responses, while still maintaining factual accuracy. It's particularly useful in scenarios where standard language models might be overly cautious or patronizing in their responses.