WizardLM 1.0 Uncensored Llama2 13B GGML
Property | Value |
---|---|
Base Model | LLaMA2 13B |
License | LLaMA2 |
Format | GGML (CPU+GPU) |
Author | Eric Hartford (Original), TheBloke (Quantized) |
What is WizardLM-1.0-Uncensored-Llama2-13B-GGML?
This is a GGML-formatted version of the WizardLM 1.0 Uncensored model, based on LLaMA2 13B architecture. It's specifically designed to provide more direct and unrestricted responses while maintaining high-quality output capabilities. The model comes in various quantization options ranging from 2-bit to 8-bit, allowing users to balance between performance and resource requirements.
Implementation Details
The model utilizes advanced k-quant methods for optimization and is available in multiple quantization variants. It supports CPU and GPU inference through llama.cpp and various compatible frameworks.
- Multiple quantization options from 2-bit (5.51GB) to 8-bit (13.79GB)
- Supports context length of 2048 tokens (expandable with RoPE scaling)
- Uses Vicuna 1.1 style prompting format
Core Capabilities
- Reduced refusal patterns and behavioral constraints
- Flexible deployment options across different hardware configurations
- Enhanced direct response capabilities while maintaining output quality
- Support for various interface options including text-generation-webui and KoboldCpp
Frequently Asked Questions
Q: What makes this model unique?
This model combines the capabilities of WizardLM with reduced constraints, while offering multiple quantization options for different hardware requirements. It's particularly notable for its balance between performance and resource usage.
Q: What are the recommended use cases?
The model is suitable for applications requiring direct and unrestricted responses, with deployment options ranging from resource-constrained environments (using 2-bit quantization) to high-performance systems (using 8-bit quantization).