Airoboros LLaMA2 70B GPT4 GPTQ
Property | Value |
---|---|
Base Model | LLaMA 2 70B |
Parameter Count | 9.1B (Quantized) |
License | Other + Meta License |
Quantization | GPTQ (Multiple Options) |
What is airoboros-l2-70B-gpt4-1.4.1-GPTQ?
This is a quantized version of the Airoboros LLaMA2 70B model, specifically optimized using GPTQ techniques to reduce model size while maintaining performance. The model is based on Meta's LLaMA2 architecture and has been fine-tuned using GPT-4 generated data to provide detailed, uncensored responses across a wide range of tasks.
Implementation Details
The model offers multiple quantization options, from 3-bit to 4-bit precision, with various group sizes and Act Order configurations. The main branch provides a 4-bit quantized version with Act Order enabled and no group size, optimized for lower VRAM requirements. The model uses a specific prompt template designed for chat interactions and maintains compatibility with various inference frameworks including ExLlama and text-generation-webui.
- Multiple quantization options (3-bit to 4-bit)
- Various group sizes (32g, 64g, 128g) for VRAM optimization
- Wikitext dataset used for quantization
- 4096 sequence length support
Core Capabilities
- Uncensored, detailed responses to user queries
- Efficient deployment options for different hardware configurations
- Compatible with popular inference frameworks
- Optimized for both quality and memory efficiency
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its combination of large-scale parameters (70B) with efficient quantization options, allowing deployment on consumer hardware while maintaining high-quality outputs. It's particularly notable for its uncensored response capability and flexible deployment options.
Q: What are the recommended use cases?
The model is well-suited for chat applications, text generation, and general language understanding tasks. It's particularly useful in scenarios where uncensored, detailed responses are needed, while working within hardware constraints through its various quantization options.