llava-13b-v0-4bit-128g
Property | Value |
---|---|
Original Model | LLaVA-13B-v0 |
Quantization | 4-bit GPTQ |
Group Size | 128 |
License | Other |
Author | wojtab |
What is llava-13b-v0-4bit-128g?
llava-13b-v0-4bit-128g is a quantized version of the LLaVA (Large Language and Vision Assistant) model, specifically optimized for efficient deployment while maintaining performance. This implementation uses 4-bit quantization with a group size of 128, significantly reducing the model's memory footprint while preserving its multimodal capabilities.
Implementation Details
The model was quantized using the GPTQ-for-LLaMa framework (CUDA branch, commit 57a2629), specifically targeting the LLaMa component of the LLaVA model. The quantization process was executed using true-sequential processing and custom grouping parameters to optimize the compression while maintaining model quality.
- Utilizes 4-bit quantization for efficient memory usage
- Implements 128-group size for balanced compression and accuracy
- Compatible with text-generation-webui's LLaVA extension
- Based on the original LLaVA-13B-delta-v0 model
Core Capabilities
- Efficient memory utilization through 4-bit quantization
- Maintains multimodal understanding capabilities
- Seamless integration with text-generation-webui
- Reduced storage requirements while preserving functionality
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient 4-bit quantization of the LLaVA architecture, making it more accessible for deployment on hardware with limited resources while maintaining the core capabilities of the original model.
Q: What are the recommended use cases?
The model is ideal for applications requiring multimodal understanding in resource-constrained environments, particularly when using the text-generation-webui interface with the LLaVA extension. It's suitable for tasks that require both vision and language processing capabilities.