Mistral-Small-24B-Instruct-2501-FP8-Dynamic

Maintained By
neuralmagic

Mistral-Small-24B-Instruct-2501-FP8-Dynamic

PropertyValue
Model Size24B parameters
QuantizationFP8 Dynamic
Release DateMarch 1, 2025
DeveloperNeural Magic
Model URLneuralmagic/Mistral-Small-24B-Instruct-2501-FP8-Dynamic

What is Mistral-Small-24B-Instruct-2501-FP8-Dynamic?

This is a quantized version of the Mistral-Small-24B-Instruct-2501 model, optimized using FP8 dynamic quantization to reduce model size while maintaining performance. The model achieves an impressive 99.28% accuracy recovery compared to the original version, while reducing disk space and GPU memory requirements by approximately 50%.

Implementation Details

The model implements FP8 quantization for both weights and activations, specifically targeting the linear operators within transformer blocks. It's designed for efficient deployment using the vLLM backend and supports OpenAI-compatible serving. The quantization process preserves the model's performance while significantly reducing resource requirements.

  • Achieves 78.88 average score on OpenLLM benchmark v1 (original: 79.45)
  • Maintains strong performance across various tasks including ARC-Challenge, GSM8K, and MMLU
  • Supports up to 4096 token context length
  • Compatible with vLLM for efficient deployment

Core Capabilities

  • Efficient resource utilization with FP8 quantization
  • Strong performance on mathematical reasoning (89.01% on GSM8K)
  • Robust general knowledge (80.55% on MMLU)
  • High accuracy on common sense tasks (84.65% on HellaSwag)

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient FP8 quantization implementation that reduces model size by 50% while maintaining over 99% of the original model's performance. It's specifically optimized for production deployment using vLLM.

Q: What are the recommended use cases?

The model is well-suited for deployment scenarios where resource efficiency is critical but high performance is required. It excels in tasks requiring mathematical reasoning, general knowledge, and common sense understanding, making it suitable for various applications from educational tools to general-purpose AI assistants.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.