Stable Vicuna 13B GGML
Property | Value |
---|---|
Base Model | CarperAI Stable Vicuna 13B |
Parameter Count | 13 Billion |
License | Other (Non-commercial) |
Paper | Based on LLaMA (arXiv:2302.13971) |
Quantization Options | 2-bit to 8-bit GGML |
What is stable-vicuna-13B-GGML?
Stable Vicuna 13B GGML is a highly optimized version of CarperAI's Stable Vicuna model, specifically converted for efficient CPU and GPU inference using the GGML framework. This implementation offers various quantization levels, from 2-bit to 8-bit, providing flexible trade-offs between model size, performance, and accuracy.
Implementation Details
The model comes in multiple quantization variants, each optimized for different use cases. The implementation includes both traditional quantization methods (q4_0, q4_1, q5_0, q5_1, q8_0) and newer k-quant methods (q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K). File sizes range from 5.43GB for the q2_K version to 13.83GB for the q8_0 version.
- Compatible with llama.cpp and various UI frameworks
- Supports GPU layer offloading for improved performance
- Includes new k-quant methods for better compression efficiency
- Requires 7.93GB to 16.33GB RAM depending on quantization level
Core Capabilities
- Optimized for conversation and instruction-following tasks
- Supports both CPU and GPU inference
- Multiple quantization options for different hardware constraints
- Integration with popular frameworks like text-generation-webui and KoboldCpp
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its variety of quantization options and optimization for CPU/GPU inference, making it accessible for users with different hardware capabilities while maintaining good performance.
Q: What are the recommended use cases?
The model is ideal for conversational AI applications, text generation, and instruction-following tasks. Users can choose different quantization levels based on their hardware constraints and performance requirements.