Mistral-Small-24B-Instruct-2501-AWQ

Maintained By
stelterlab

Mistral-Small-24B-Instruct-2501-AWQ

PropertyValue
Parameter Count24 Billion
QuantizationINT4 AWQ
Context Length32k tokens
LicenseApache 2.0
Base ModelMistral-Small-24B-Base-2501

What is Mistral-Small-24B-Instruct-2501-AWQ?

This is a quantized version of Mistral's 24B parameter instruction-tuned language model, compressed using AutoAWQ to INT4 precision while maintaining performance. The model represents a significant advancement in efficient AI deployment, offering capabilities comparable to larger models while fitting on consumer hardware like a single RTX 4090 or 32GB RAM MacBook.

Implementation Details

The model utilizes AWQ (Activation-aware Weight Quantization) technology to reduce the model size while preserving performance. It employs a Tekken tokenizer with a 131k vocabulary size and supports the V7-Tekken instruction template format. The model can be deployed using various frameworks including vLLM and Transformers, with specific optimizations for production environments.

  • Quantization: INT4 GEMM with AutoAWQ
  • Tokenizer: Tekken (131k vocab)
  • Context Window: 32k tokens
  • Deployment Options: vLLM, Transformers, Ollama

Core Capabilities

  • Multilingual support for dozens of languages
  • Native function calling and JSON output
  • Strong performance in reasoning and instruction following
  • Competitive benchmark results against larger models
  • Efficient deployment on consumer hardware

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines high performance with efficient resource usage through AWQ quantization, making it accessible for local deployment while maintaining competitive performance with larger models. It shows particularly strong results in human evaluations against models like Gemma-2-27B and Qwen-2.5-32B.

Q: What are the recommended use cases?

The model excels in fast response conversational agents, low latency function calling, and as a subject matter expert through fine-tuning. It's particularly well-suited for organizations handling sensitive data locally and hobbyists looking for powerful local AI capabilities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.