Vicuna 13B v1.5 16K GGML

Property	Value
Base Model	Llama 2
License	Llama 2 Community License Agreement
Context Length	16K tokens
Paper	Research Paper
Developer	LMSYS

What is vicuna-13B-v1.5-16K-GGML?

Vicuna-13B-v1.5-16K-GGML is a GGML-formatted version of the Vicuna chat assistant, fine-tuned from Llama 2 on approximately 125K user-shared conversations from ShareGPT. This version features an extended 16K context window through linear RoPE scaling and is available in multiple quantization formats for efficient deployment.

Implementation Details

The model is available in various quantization formats ranging from 2-bit to 8-bit precision, offering different trade-offs between model size, memory usage, and inference speed. The GGML format is optimized for CPU and GPU inference, though it's worth noting that GGML has been superseded by GGUF format as of August 2023.

Multiple quantization options (q2_K through q8_0) with sizes ranging from 5.51GB to 13.79GB
Implements new k-quant methods for improved efficiency
Compatible with popular frameworks like text-generation-webui and KoboldCpp
Supports GPU acceleration through CUDA and OpenCL

Core Capabilities

Extended 16K token context window for handling longer conversations
Specialized in chat-based interactions and detailed responses
Maintains coherent conversation flow with proper context handling
Supports various inference options including CPU and GPU acceleration

Frequently Asked Questions

Q: What makes this model unique?

This model combines the robust capabilities of Llama 2 with extensive fine-tuning on real-world conversations, while offering an extended 16K context window. The GGML format and various quantization options make it highly accessible for different hardware configurations.

Q: What are the recommended use cases?

The model is primarily intended for research on large language models and chatbots. It's particularly suitable for researchers and hobbyists in NLP, machine learning, and AI who need a capable chat assistant that can handle extended conversations.