stable-vicuna-13B-GGML

stable-vicuna-13B-GGML

TheBloke

13B parameter LLaMA-based model optimized for CPU/GPU inference with GGML quantization, offering various compression levels from 2-bit to 8-bit

PropertyValue
Base ModelCarperAI Stable Vicuna 13B
Parameter Count13 Billion
LicenseOther (Non-commercial)
PaperBased on LLaMA (arXiv:2302.13971)
Quantization Options2-bit to 8-bit GGML

What is stable-vicuna-13B-GGML?

Stable Vicuna 13B GGML is a highly optimized version of CarperAI's Stable Vicuna model, specifically converted for efficient CPU and GPU inference using the GGML framework. This implementation offers various quantization levels, from 2-bit to 8-bit, providing flexible trade-offs between model size, performance, and accuracy.

Implementation Details

The model comes in multiple quantization variants, each optimized for different use cases. The implementation includes both traditional quantization methods (q4_0, q4_1, q5_0, q5_1, q8_0) and newer k-quant methods (q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K). File sizes range from 5.43GB for the q2_K version to 13.83GB for the q8_0 version.

  • Compatible with llama.cpp and various UI frameworks
  • Supports GPU layer offloading for improved performance
  • Includes new k-quant methods for better compression efficiency
  • Requires 7.93GB to 16.33GB RAM depending on quantization level

Core Capabilities

  • Optimized for conversation and instruction-following tasks
  • Supports both CPU and GPU inference
  • Multiple quantization options for different hardware constraints
  • Integration with popular frameworks like text-generation-webui and KoboldCpp

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its variety of quantization options and optimization for CPU/GPU inference, making it accessible for users with different hardware capabilities while maintaining good performance.

Q: What are the recommended use cases?

The model is ideal for conversational AI applications, text generation, and instruction-following tasks. Users can choose different quantization levels based on their hardware constraints and performance requirements.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026