llama2_70b_chat_uncensored-GPTQ

llama2_70b_chat_uncensored-GPTQ

TheBloke

Uncensored 70B parameter LLaMA2 chat model, GPTQ-quantized for efficient deployment. Features multiple quantization options and straightforward responses without filters.

PropertyValue
Parameter Count70B
LicenseLLaMA 2
PaperQLoRA (arxiv:2305.14314)
Base ModelLLaMA 2 70B

What is llama2_70b_chat_uncensored-GPTQ?

This is a GPTQ-quantized version of the Llama2 70B Chat Uncensored model, specifically optimized for efficient deployment while maintaining performance. The model was fine-tuned using the uncensored Wizard-Vicuna conversation dataset, designed to provide direct and unfiltered responses while maintaining factual accuracy.

Implementation Details

The model offers multiple quantization options, including 3-bit and 4-bit versions with various group sizes, allowing users to balance between VRAM usage and model accuracy. The implementation supports different branches for specific deployment scenarios, from minimal VRAM requirements to maximum inference quality.

  • Multiple GPTQ parameter permutations available (3-bit to 4-bit)
  • Group size options ranging from None to 128g
  • Compatible with AutoGPTQ, Transformers, and ExLlama (4-bit versions)
  • Customizable inference parameters for temperature and sampling

Core Capabilities

  • Straightforward, unfiltered responses to queries
  • Efficient memory usage through quantization
  • Support for context window of 4096 tokens
  • Flexible deployment options for different hardware configurations

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its combination of large-scale capabilities (70B parameters) with uncensored training, while being optimized for practical deployment through GPTQ quantization. It provides direct, unfiltered responses without the excessive safety constraints of standard LLaMA 2 chat models.

Q: What are the recommended use cases?

The model is suitable for applications requiring direct and unfiltered language model responses, while still maintaining factual accuracy. It's particularly useful in scenarios where standard language models might be overly cautious or patronizing in their responses.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026