Vicuna-13B-1.1-GPTQ

Maintained By
TheBloke

Vicuna-13B-1.1-GPTQ

PropertyValue
LicenseApache License 2.0
Training Data70K conversations from ShareGPT
Quantization4-bit GPTQ
Base ModelLLaMA 13B

What is Vicuna-13B-1.1-GPTQ?

Vicuna-13B-1.1-GPTQ is a quantized version of the Vicuna language model, specifically optimized for efficient deployment while maintaining high performance. It's created by merging delta weights with the original LLaMA 13B model and then quantized to 4-bit precision using GPTQ-for-LLaMa technology. This results in a significantly smaller model size while preserving most of the original model's capabilities.

Implementation Details

The model features 4-bit quantization with a groupsize of 128, available in both safetensors and pt formats. It's specifically designed for GPU deployment and requires specific GPTQ-for-LLaMa implementation for optimal performance.

  • Supports both act-order and non-act-order variants
  • Compatible with text-generation-webui
  • Includes specialized tokenization with EOS token ""
  • Optimized for conversational AI applications

Core Capabilities

  • High-quality conversational responses
  • Efficient memory usage through 4-bit quantization
  • Maintains base model performance while reducing resource requirements
  • Supports both CPU and GPU inference

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient 4-bit quantization while maintaining the high-quality performance of Vicuna 13B. It's specifically optimized for practical deployment scenarios where memory efficiency is crucial.

Q: What are the recommended use cases?

The model is primarily intended for research in natural language processing, machine learning, and artificial intelligence. It's particularly well-suited for chatbot applications and conversational AI research where resource efficiency is important.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.