Mistral-Small-24B-Instruct-2501-AWQ

Mistral-Small-24B-Instruct-2501-AWQ

stelterlab

24B parameter instruction-tuned LLM, AWQ-quantized to INT4. Strong multilingual capabilities, 32k context, best-in-class performance vs similar sized models.

PropertyValue
Parameter Count24 Billion
QuantizationINT4 AWQ
Context Length32k tokens
LicenseApache 2.0
Base ModelMistral-Small-24B-Base-2501

What is Mistral-Small-24B-Instruct-2501-AWQ?

This is a quantized version of Mistral's 24B parameter instruction-tuned language model, compressed using AutoAWQ to INT4 precision while maintaining performance. The model represents a significant advancement in efficient AI deployment, offering capabilities comparable to larger models while fitting on consumer hardware like a single RTX 4090 or 32GB RAM MacBook.

Implementation Details

The model utilizes AWQ (Activation-aware Weight Quantization) technology to reduce the model size while preserving performance. It employs a Tekken tokenizer with a 131k vocabulary size and supports the V7-Tekken instruction template format. The model can be deployed using various frameworks including vLLM and Transformers, with specific optimizations for production environments.

  • Quantization: INT4 GEMM with AutoAWQ
  • Tokenizer: Tekken (131k vocab)
  • Context Window: 32k tokens
  • Deployment Options: vLLM, Transformers, Ollama

Core Capabilities

  • Multilingual support for dozens of languages
  • Native function calling and JSON output
  • Strong performance in reasoning and instruction following
  • Competitive benchmark results against larger models
  • Efficient deployment on consumer hardware

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines high performance with efficient resource usage through AWQ quantization, making it accessible for local deployment while maintaining competitive performance with larger models. It shows particularly strong results in human evaluations against models like Gemma-2-27B and Qwen-2.5-32B.

Q: What are the recommended use cases?

The model excels in fast response conversational agents, low latency function calling, and as a subject matter expert through fine-tuning. It's particularly well-suited for organizations handling sensitive data locally and hobbyists looking for powerful local AI capabilities.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026