Mistral-7B-Instruct-v0.1-AWQ
Property | Value |
---|---|
Parameter Count | 7 Billion |
Quantization | 4-bit AWQ |
Model Size | 4.15 GB |
License | Apache 2.0 |
Author | Mistral AI (Original), TheBloke (Quantized) |
What is Mistral-7B-Instruct-v0.1-AWQ?
Mistral-7B-Instruct-v0.1-AWQ is a quantized version of the original Mistral-7B-Instruct model, optimized using AWQ (Activation-aware Weight Quantization) technology. This version maintains the powerful capabilities of the original model while significantly reducing its size and memory requirements through 4-bit precision quantization with 128-group size.
Implementation Details
The model implements several advanced architectural features including Grouped-Query Attention, Sliding-Window Attention, and uses a Byte-fallback BPE tokenizer. The AWQ quantization process was performed using wikitext as the calibration dataset, supporting a sequence length of 4096 tokens.
- Quantization Method: 4-bit AWQ with 128 group size
- Original Model Size: 7B parameters
- Quantized Size: 4.15 GB
- Supported Frameworks: AutoAWQ
Core Capabilities
- Efficient inference with reduced memory footprint
- Maintains original model's instruction-following capabilities
- Supports context length of 4096 tokens
- Compatible with text generation tasks
- Uses specific instruction format:
[INST] prompt [/INST]
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient 4-bit AWQ quantization, which provides faster inference compared to GPTQ while maintaining model quality. It's specifically optimized for GPU inference and represents one of the first AWQ quantized versions of the Mistral architecture.
Q: What are the recommended use cases?
The model is ideal for applications requiring efficient inference on GPU hardware, particularly for instruction-following tasks, text generation, and conversational AI where memory efficiency is crucial but performance cannot be compromised.