llava-13b-v0-4bit-128g

Property	Value
Original Model	LLaVA-13B-v0
Quantization	4-bit GPTQ
Group Size	128
License	Other
Author	wojtab

What is llava-13b-v0-4bit-128g?

llava-13b-v0-4bit-128g is a quantized version of the LLaVA (Large Language and Vision Assistant) model, specifically optimized for efficient deployment while maintaining performance. This implementation uses 4-bit quantization with a group size of 128, significantly reducing the model's memory footprint while preserving its multimodal capabilities.

Implementation Details

The model was quantized using the GPTQ-for-LLaMa framework (CUDA branch, commit 57a2629), specifically targeting the LLaMa component of the LLaVA model. The quantization process was executed using true-sequential processing and custom grouping parameters to optimize the compression while maintaining model quality.

Utilizes 4-bit quantization for efficient memory usage
Implements 128-group size for balanced compression and accuracy
Compatible with text-generation-webui's LLaVA extension
Based on the original LLaVA-13B-delta-v0 model

Core Capabilities

Efficient memory utilization through 4-bit quantization
Maintains multimodal understanding capabilities
Seamless integration with text-generation-webui
Reduced storage requirements while preserving functionality

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient 4-bit quantization of the LLaVA architecture, making it more accessible for deployment on hardware with limited resources while maintaining the core capabilities of the original model.

Q: What are the recommended use cases?

The model is ideal for applications requiring multimodal understanding in resource-constrained environments, particularly when using the text-generation-webui interface with the LLaVA extension. It's suitable for tasks that require both vision and language processing capabilities.