Llama2 70B Chat Uncensored GPTQ

Property	Value
Parameter Count	70B
License	LLaMA 2
Paper	QLoRA (arxiv:2305.14314)
Base Model	LLaMA 2 70B

What is llama2_70b_chat_uncensored-GPTQ?

This is a GPTQ-quantized version of the Llama2 70B Chat Uncensored model, specifically optimized for efficient deployment while maintaining performance. The model was fine-tuned using the uncensored Wizard-Vicuna conversation dataset, designed to provide direct and unfiltered responses while maintaining factual accuracy.

Implementation Details

The model offers multiple quantization options, including 3-bit and 4-bit versions with various group sizes, allowing users to balance between VRAM usage and model accuracy. The implementation supports different branches for specific deployment scenarios, from minimal VRAM requirements to maximum inference quality.

Multiple GPTQ parameter permutations available (3-bit to 4-bit)
Group size options ranging from None to 128g
Compatible with AutoGPTQ, Transformers, and ExLlama (4-bit versions)
Customizable inference parameters for temperature and sampling

Core Capabilities

Straightforward, unfiltered responses to queries
Efficient memory usage through quantization
Support for context window of 4096 tokens
Flexible deployment options for different hardware configurations

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its combination of large-scale capabilities (70B parameters) with uncensored training, while being optimized for practical deployment through GPTQ quantization. It provides direct, unfiltered responses without the excessive safety constraints of standard LLaMA 2 chat models.

Q: What are the recommended use cases?

The model is suitable for applications requiring direct and unfiltered language model responses, while still maintaining factual accuracy. It's particularly useful in scenarios where standard language models might be overly cautious or patronizing in their responses.