WizardLM 1.0 Uncensored Llama2 13B GGML

Property	Value
Base Model	LLaMA2 13B
License	LLaMA2
Format	GGML (CPU+GPU)
Author	Eric Hartford (Original), TheBloke (Quantized)

What is WizardLM-1.0-Uncensored-Llama2-13B-GGML?

This is a GGML-formatted version of the WizardLM 1.0 Uncensored model, based on LLaMA2 13B architecture. It's specifically designed to provide more direct and unrestricted responses while maintaining high-quality output capabilities. The model comes in various quantization options ranging from 2-bit to 8-bit, allowing users to balance between performance and resource requirements.

Implementation Details

The model utilizes advanced k-quant methods for optimization and is available in multiple quantization variants. It supports CPU and GPU inference through llama.cpp and various compatible frameworks.

Multiple quantization options from 2-bit (5.51GB) to 8-bit (13.79GB)
Supports context length of 2048 tokens (expandable with RoPE scaling)
Uses Vicuna 1.1 style prompting format

Core Capabilities

Reduced refusal patterns and behavioral constraints
Flexible deployment options across different hardware configurations
Enhanced direct response capabilities while maintaining output quality
Support for various interface options including text-generation-webui and KoboldCpp

Frequently Asked Questions

Q: What makes this model unique?

This model combines the capabilities of WizardLM with reduced constraints, while offering multiple quantization options for different hardware requirements. It's particularly notable for its balance between performance and resource usage.

Q: What are the recommended use cases?

The model is suitable for applications requiring direct and unrestricted responses, with deployment options ranging from resource-constrained environments (using 2-bit quantization) to high-performance systems (using 8-bit quantization).