llama2_70b_chat_uncensored-GGML

Maintained By
TheBloke

LLaMA2 70B Chat Uncensored GGML

PropertyValue
Base ModelLLaMA2 70B
FormatGGML (Deprecated)
LicenseLLaMA2
PaperQLoRA (arxiv:2305.14314)
Training Datasetwizard_vicuna_70k_unfiltered

What is llama2_70b_chat_uncensored-GGML?

This is a quantized version of the uncensored LLaMA2 70B chat model, specifically formatted in GGML for efficient CPU and GPU inference. The model was fine-tuned using QLoRA on an unfiltered conversation dataset to provide more direct, unrestricted responses compared to the base LLaMA2 model.

Implementation Details

The model is available in multiple quantization levels (Q2_K through Q5_K_M) offering different trade-offs between model size (28.59GB to 48.75GB) and performance. It requires the use of '-gqa 8' argument for proper functionality and supports various inference frameworks including llama.cpp, text-generation-webui, and KoboldCpp.

  • Multiple quantization options for different hardware capabilities
  • Supports GPU acceleration with both CUDA and Metal
  • Context window of 4096 tokens
  • Uses Human-Response prompt template

Core Capabilities

  • Provides straightforward, unfiltered responses
  • Maintains high accuracy while reducing model size through quantization
  • Supports partial GPU offloading for optimal performance
  • Compatible with major GGML inference frameworks

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its uncensored approach to responses, providing direct answers without the typical AI safety filters, while maintaining the powerful capabilities of the 70B parameter architecture.

Q: What are the recommended use cases?

The model is suited for applications requiring direct, unfiltered responses, though users should note that GGML format is now deprecated in favor of GGUF format. It's particularly useful for CPU+GPU inference scenarios where straightforward interactions are preferred.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.