gpt4-x-vicuna-13B-GGML
Property | Value |
---|---|
Base Model | Vicuna-13B-1.1 |
License | Other |
Format | GGML |
Author | TheBloke |
What is gpt4-x-vicuna-13B-GGML?
gpt4-x-vicuna-13B-GGML is a quantized version of NousResearch's GPT4-x-Vicuna-13B model, optimized for CPU and GPU inference. The model was fine-tuned on approximately 180,000 GPT-4 generated instructions, including datasets from GPTeacher, Roleplay v2, GPT-4-LLM Uncensored, WizardLM Uncensored, and Nous Research Instruct Dataset.
Implementation Details
The model offers multiple quantization options ranging from 2-bit to 8-bit, providing different trade-offs between model size, performance, and resource usage. The implementation includes both traditional llama.cpp quantization methods (q4_0, q4_1, q5_0, q5_1, q8_0) and new k-quant methods for optimized performance.
- Model sizes range from 5.51GB (q2_K) to 13.83GB (q8_0)
- Compatible with llama.cpp and various UI implementations
- Supports GPU layer offloading for improved performance
- Uses Alpaca prompt format for interactions
Core Capabilities
- Strong performance on various benchmarks (ARC, HellaSwag, PIQA)
- Cleaned of OpenAI censorship patterns
- Flexible deployment options for different hardware configurations
- Supports both instruction-following and chat-style interactions
Frequently Asked Questions
Q: What makes this model unique?
This model combines high-quality GPT-4 training data with various quantization options, making it accessible for different hardware setups while maintaining good performance. It's particularly notable for being uncensored and offering multiple optimization choices.
Q: What are the recommended use cases?
The model is suitable for various text generation tasks, including instruction-following and chat applications. Users can choose different quantization levels based on their hardware capabilities, with q4_1 offering a good balance between performance and resource usage for most users.