GPT4-x-Vicuna-13b-4bit
Property | Value |
---|---|
License | GPL |
Base Model | Vicuna-13b-1.1 |
Quantization | 4-bit GPTQ (Groupsize 128) |
Training Data Size | ~180k instructions |
What is GPT4-x-Vicuna-13b-4bit?
GPT4-x-Vicuna-13b-4bit is a quantized language model developed by NousResearch, built upon the Vicuna-13b architecture. This model represents a significant advancement in accessible AI, offering GPT-4 level instruction following while maintaining a smaller footprint through 4-bit quantization. The model has been carefully fine-tuned on a diverse set of high-quality instruction datasets, including GPTeacher, Roleplay v2, WizardLM Uncensored, and the Nous Research Instruct Dataset.
Implementation Details
The model utilizes GPTQ 4-bit quantization with a groupsize of 128, significantly reducing the memory footprint while maintaining performance. Training was conducted on 8 A100-80GB GPUs for 5 epochs using Alpaca deepspeed training code. The model implements two prompt formats following the Alpaca structure, supporting both basic instruction-response pairs and instruction-input-response configurations.
- Trained on approximately 180,000 GPT-4 generated instructions
- Cleaned dataset removing OpenAI censorship patterns
- Optimized for reduced memory usage through 4-bit quantization
- Built on the robust Vicuna-13b-1.1 architecture
Core Capabilities
- High-quality instruction following
- Reduced censorship compared to base models
- Efficient memory usage through quantization
- Support for multiple prompt formats
- Enhanced performance through specialized training datasets
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its combination of GPT-4 quality instruction following, reduced censorship, and efficient 4-bit quantization, making it more accessible for deployment on consumer hardware while maintaining high performance.
Q: What are the recommended use cases?
The model is well-suited for instruction-following tasks, conversational AI applications, and scenarios requiring reduced censorship. It's particularly valuable for users seeking a balance between performance and resource efficiency.