vicuna-13B-v1.5-16K-GGML

Maintained By
TheBloke

Vicuna 13B v1.5 16K GGML

PropertyValue
Base ModelLlama 2
LicenseLlama 2 Community License Agreement
Context Length16K tokens
PaperResearch Paper
DeveloperLMSYS

What is vicuna-13B-v1.5-16K-GGML?

Vicuna-13B-v1.5-16K-GGML is a GGML-formatted version of the Vicuna chat assistant, fine-tuned from Llama 2 on approximately 125K user-shared conversations from ShareGPT. This version features an extended 16K context window through linear RoPE scaling and is available in multiple quantization formats for efficient deployment.

Implementation Details

The model is available in various quantization formats ranging from 2-bit to 8-bit precision, offering different trade-offs between model size, memory usage, and inference speed. The GGML format is optimized for CPU and GPU inference, though it's worth noting that GGML has been superseded by GGUF format as of August 2023.

  • Multiple quantization options (q2_K through q8_0) with sizes ranging from 5.51GB to 13.79GB
  • Implements new k-quant methods for improved efficiency
  • Compatible with popular frameworks like text-generation-webui and KoboldCpp
  • Supports GPU acceleration through CUDA and OpenCL

Core Capabilities

  • Extended 16K token context window for handling longer conversations
  • Specialized in chat-based interactions and detailed responses
  • Maintains coherent conversation flow with proper context handling
  • Supports various inference options including CPU and GPU acceleration

Frequently Asked Questions

Q: What makes this model unique?

This model combines the robust capabilities of Llama 2 with extensive fine-tuning on real-world conversations, while offering an extended 16K context window. The GGML format and various quantization options make it highly accessible for different hardware configurations.

Q: What are the recommended use cases?

The model is primarily intended for research on large language models and chatbots. It's particularly suitable for researchers and hobbyists in NLP, machine learning, and AI who need a capable chat assistant that can handle extended conversations.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.