Mistral-Small-24B-Instruct-2501-FP8-Dynamic

Property	Value
Model Size	24B parameters
Quantization	FP8 Dynamic
Release Date	March 1, 2025
Developer	Neural Magic
Model URL	neuralmagic/Mistral-Small-24B-Instruct-2501-FP8-Dynamic

What is Mistral-Small-24B-Instruct-2501-FP8-Dynamic?

This is a quantized version of the Mistral-Small-24B-Instruct-2501 model, optimized using FP8 dynamic quantization to reduce model size while maintaining performance. The model achieves an impressive 99.28% accuracy recovery compared to the original version, while reducing disk space and GPU memory requirements by approximately 50%.

Implementation Details

The model implements FP8 quantization for both weights and activations, specifically targeting the linear operators within transformer blocks. It's designed for efficient deployment using the vLLM backend and supports OpenAI-compatible serving. The quantization process preserves the model's performance while significantly reducing resource requirements.

Achieves 78.88 average score on OpenLLM benchmark v1 (original: 79.45)
Maintains strong performance across various tasks including ARC-Challenge, GSM8K, and MMLU
Supports up to 4096 token context length
Compatible with vLLM for efficient deployment

Core Capabilities

Efficient resource utilization with FP8 quantization
Strong performance on mathematical reasoning (89.01% on GSM8K)
Robust general knowledge (80.55% on MMLU)
High accuracy on common sense tasks (84.65% on HellaSwag)

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient FP8 quantization implementation that reduces model size by 50% while maintaining over 99% of the original model's performance. It's specifically optimized for production deployment using vLLM.

Q: What are the recommended use cases?

The model is well-suited for deployment scenarios where resource efficiency is critical but high performance is required. It excels in tasks requiring mathematical reasoning, general knowledge, and common sense understanding, making it suitable for various applications from educational tools to general-purpose AI assistants.