llama2_7b_chat_uncensored-GGML

Maintained By
TheBloke

Llama2 7B Chat Uncensored GGML

PropertyValue
Base ModelLLaMA 2 7B
Model TypeChat/Conversational
LicenseOther
FormatGGML (Various quantizations)

What is llama2_7b_chat_uncensored-GGML?

This is a quantized version of George Sung's uncensored LLaMA 2 chat model, specifically optimized for CPU and GPU inference using the GGML format. The model was fine-tuned on the wizard_vicuna_70k_unfiltered dataset using QLoRA techniques, offering an uncensored variant of the original LLaMA 2 capabilities.

Implementation Details

The model is available in multiple quantization levels ranging from 2-bit to 8-bit, offering different tradeoffs between model size, memory usage, and inference speed. For example, the q4_K_M variant uses 4-bit quantization with optimized k-quant methods, requiring about 4.08GB of storage and 6.58GB of RAM during operation.

  • Multiple quantization options (2-bit to 8-bit)
  • Supports both CPU and GPU inference
  • Uses the Human-Response prompt template
  • Compatible with various GGML-supporting frameworks

Core Capabilities

  • Uncensored chat responses
  • Context window support up to 4096 tokens
  • Efficient inference on consumer hardware
  • Flexible deployment options across different platforms

Frequently Asked Questions

Q: What makes this model unique?

This model combines the powerful LLaMA 2 architecture with uncensored training data, while offering highly efficient quantized versions for practical deployment. The various quantization options allow users to choose the optimal balance between model size and performance for their specific use case.

Q: What are the recommended use cases?

The model is particularly suited for applications requiring unrestricted conversation capabilities while operating under hardware constraints. The multiple quantization options make it versatile for deployment on different hardware configurations, from resource-constrained environments to high-performance systems.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.