Mistral-Small-24B-Instruct-2501-ungated

Mistral-Small-24B-Instruct-2501-ungated

adamo1139

Mistral Small 24B Instruct is a powerful 24B parameter LLM with 32k context window, multilingual capabilities, and Apache 2.0 license. Runs on RTX 4090/32GB RAM when quantized.

PropertyValue
Parameter Count24 Billion
Context Window32,000 tokens
LicenseApache 2.0
TokenizerTekken (131k vocabulary)
Model TypeInstruction-tuned LLM

What is Mistral-Small-24B-Instruct-2501-ungated?

Mistral-Small-24B-Instruct-2501 represents a significant advancement in compact yet powerful language models, offering performance comparable to larger models while maintaining efficiency. It's an instruction-fine-tuned version of the Mistral-Small-24B-Base-2501, designed to run locally on consumer hardware when properly quantized.

Implementation Details

The model utilizes the Tekken tokenizer with a 131k vocabulary size and implements a specialized instruction template (V7-Tekken) for optimal performance. It can be deployed using vLLM or Transformers frameworks, requiring approximately 60GB of GPU RAM for full-precision inference.

  • Supports both server/client and offline implementations
  • Compatible with vLLM >= 0.6.4 and mistral_common >= 1.5.2
  • Includes built-in system prompt support
  • Features native function calling capabilities

Core Capabilities

  • Multilingual support for dozens of languages including English, French, German, Spanish, and more
  • Advanced reasoning and conversational abilities
  • Agent-centric design with function calling and JSON output
  • 32k context window for handling lengthy inputs
  • Local deployment capability on RTX 4090 or 32GB RAM MacBook (when quantized)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its exceptional balance of size and performance, offering capabilities comparable to larger models while being deployable on consumer hardware. Its knowledge density and efficient architecture make it particularly suitable for both hobbyist and enterprise applications.

Q: What are the recommended use cases?

The model excels in fast response conversational agents, low latency function calling, and as subject matter experts through fine-tuning. It's particularly valuable for organizations handling sensitive data that requires local inference capabilities.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026