Stable Vicuna 13B GGML

Property	Value
Base Model	CarperAI Stable Vicuna 13B
Parameter Count	13 Billion
License	Other (Non-commercial)
Paper	Based on LLaMA (arXiv:2302.13971)
Quantization Options	2-bit to 8-bit GGML

What is stable-vicuna-13B-GGML?

Stable Vicuna 13B GGML is a highly optimized version of CarperAI's Stable Vicuna model, specifically converted for efficient CPU and GPU inference using the GGML framework. This implementation offers various quantization levels, from 2-bit to 8-bit, providing flexible trade-offs between model size, performance, and accuracy.

Implementation Details

The model comes in multiple quantization variants, each optimized for different use cases. The implementation includes both traditional quantization methods (q4_0, q4_1, q5_0, q5_1, q8_0) and newer k-quant methods (q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K). File sizes range from 5.43GB for the q2_K version to 13.83GB for the q8_0 version.

Compatible with llama.cpp and various UI frameworks
Supports GPU layer offloading for improved performance
Includes new k-quant methods for better compression efficiency
Requires 7.93GB to 16.33GB RAM depending on quantization level

Core Capabilities

Optimized for conversation and instruction-following tasks
Supports both CPU and GPU inference
Multiple quantization options for different hardware constraints
Integration with popular frameworks like text-generation-webui and KoboldCpp

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its variety of quantization options and optimization for CPU/GPU inference, making it accessible for users with different hardware capabilities while maintaining good performance.

Q: What are the recommended use cases?

The model is ideal for conversational AI applications, text generation, and instruction-following tasks. Users can choose different quantization levels based on their hardware constraints and performance requirements.