gpt4-x-vicuna-13B-GGML

Property	Value
Base Model	Vicuna-13B-1.1
License	Other
Format	GGML
Author	TheBloke

What is gpt4-x-vicuna-13B-GGML?

gpt4-x-vicuna-13B-GGML is a quantized version of NousResearch's GPT4-x-Vicuna-13B model, optimized for CPU and GPU inference. The model was fine-tuned on approximately 180,000 GPT-4 generated instructions, including datasets from GPTeacher, Roleplay v2, GPT-4-LLM Uncensored, WizardLM Uncensored, and Nous Research Instruct Dataset.

Implementation Details

The model offers multiple quantization options ranging from 2-bit to 8-bit, providing different trade-offs between model size, performance, and resource usage. The implementation includes both traditional llama.cpp quantization methods (q4_0, q4_1, q5_0, q5_1, q8_0) and new k-quant methods for optimized performance.

Model sizes range from 5.51GB (q2_K) to 13.83GB (q8_0)
Compatible with llama.cpp and various UI implementations
Supports GPU layer offloading for improved performance
Uses Alpaca prompt format for interactions

Core Capabilities

Strong performance on various benchmarks (ARC, HellaSwag, PIQA)
Cleaned of OpenAI censorship patterns
Flexible deployment options for different hardware configurations
Supports both instruction-following and chat-style interactions

Frequently Asked Questions

Q: What makes this model unique?

This model combines high-quality GPT-4 training data with various quantization options, making it accessible for different hardware setups while maintaining good performance. It's particularly notable for being uncensored and offering multiple optimization choices.

Q: What are the recommended use cases?

The model is suitable for various text generation tasks, including instruction-following and chat applications. Users can choose different quantization levels based on their hardware capabilities, with q4_1 offering a good balance between performance and resource usage for most users.