Mistral-Small-24B-Instruct-2501-AWQ
Property | Value |
---|---|
Parameter Count | 24 Billion |
Quantization | INT4 AWQ |
Context Length | 32k tokens |
License | Apache 2.0 |
Base Model | Mistral-Small-24B-Base-2501 |
What is Mistral-Small-24B-Instruct-2501-AWQ?
This is a quantized version of Mistral's 24B parameter instruction-tuned language model, compressed using AutoAWQ to INT4 precision while maintaining performance. The model represents a significant advancement in efficient AI deployment, offering capabilities comparable to larger models while fitting on consumer hardware like a single RTX 4090 or 32GB RAM MacBook.
Implementation Details
The model utilizes AWQ (Activation-aware Weight Quantization) technology to reduce the model size while preserving performance. It employs a Tekken tokenizer with a 131k vocabulary size and supports the V7-Tekken instruction template format. The model can be deployed using various frameworks including vLLM and Transformers, with specific optimizations for production environments.
- Quantization: INT4 GEMM with AutoAWQ
- Tokenizer: Tekken (131k vocab)
- Context Window: 32k tokens
- Deployment Options: vLLM, Transformers, Ollama
Core Capabilities
- Multilingual support for dozens of languages
- Native function calling and JSON output
- Strong performance in reasoning and instruction following
- Competitive benchmark results against larger models
- Efficient deployment on consumer hardware
Frequently Asked Questions
Q: What makes this model unique?
This model uniquely combines high performance with efficient resource usage through AWQ quantization, making it accessible for local deployment while maintaining competitive performance with larger models. It shows particularly strong results in human evaluations against models like Gemma-2-27B and Qwen-2.5-32B.
Q: What are the recommended use cases?
The model excels in fast response conversational agents, low latency function calling, and as a subject matter expert through fine-tuning. It's particularly well-suited for organizations handling sensitive data locally and hobbyists looking for powerful local AI capabilities.