Mistral-Small-24B-Instruct-2501
Property | Value |
---|---|
Parameter Count | 24 Billion |
Context Window | 32,000 tokens |
License | Apache 2.0 |
Tokenizer | Tekken (131k vocabulary) |
Model Type | Instruction-tuned LLM |
What is Mistral-Small-24B-Instruct-2501-ungated?
Mistral-Small-24B-Instruct-2501 represents a significant advancement in compact yet powerful language models, offering performance comparable to larger models while maintaining efficiency. It's an instruction-fine-tuned version of the Mistral-Small-24B-Base-2501, designed to run locally on consumer hardware when properly quantized.
Implementation Details
The model utilizes the Tekken tokenizer with a 131k vocabulary size and implements a specialized instruction template (V7-Tekken) for optimal performance. It can be deployed using vLLM or Transformers frameworks, requiring approximately 60GB of GPU RAM for full-precision inference.
- Supports both server/client and offline implementations
- Compatible with vLLM >= 0.6.4 and mistral_common >= 1.5.2
- Includes built-in system prompt support
- Features native function calling capabilities
Core Capabilities
- Multilingual support for dozens of languages including English, French, German, Spanish, and more
- Advanced reasoning and conversational abilities
- Agent-centric design with function calling and JSON output
- 32k context window for handling lengthy inputs
- Local deployment capability on RTX 4090 or 32GB RAM MacBook (when quantized)
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its exceptional balance of size and performance, offering capabilities comparable to larger models while being deployable on consumer hardware. Its knowledge density and efficient architecture make it particularly suitable for both hobbyist and enterprise applications.
Q: What are the recommended use cases?
The model excels in fast response conversational agents, low latency function calling, and as subject matter experts through fine-tuning. It's particularly valuable for organizations handling sensitive data that requires local inference capabilities.