Mistral-Small-24B-Instruct-2501-AWQ

Property	Value
Parameter Count	24 Billion
Quantization	INT4 AWQ
Context Length	32k tokens
License	Apache 2.0
Base Model	Mistral-Small-24B-Base-2501

What is Mistral-Small-24B-Instruct-2501-AWQ?

This is a quantized version of Mistral's 24B parameter instruction-tuned language model, compressed using AutoAWQ to INT4 precision while maintaining performance. The model represents a significant advancement in efficient AI deployment, offering capabilities comparable to larger models while fitting on consumer hardware like a single RTX 4090 or 32GB RAM MacBook.

Implementation Details

The model utilizes AWQ (Activation-aware Weight Quantization) technology to reduce the model size while preserving performance. It employs a Tekken tokenizer with a 131k vocabulary size and supports the V7-Tekken instruction template format. The model can be deployed using various frameworks including vLLM and Transformers, with specific optimizations for production environments.

Quantization: INT4 GEMM with AutoAWQ
Tokenizer: Tekken (131k vocab)
Context Window: 32k tokens
Deployment Options: vLLM, Transformers, Ollama

Core Capabilities

Multilingual support for dozens of languages
Native function calling and JSON output
Strong performance in reasoning and instruction following
Competitive benchmark results against larger models
Efficient deployment on consumer hardware

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines high performance with efficient resource usage through AWQ quantization, making it accessible for local deployment while maintaining competitive performance with larger models. It shows particularly strong results in human evaluations against models like Gemma-2-27B and Qwen-2.5-32B.

Q: What are the recommended use cases?

The model excels in fast response conversational agents, low latency function calling, and as a subject matter expert through fine-tuning. It's particularly well-suited for organizations handling sensitive data locally and hobbyists looking for powerful local AI capabilities.