Mistral-Nemo-Instruct-2407-FP8

Mistral-Nemo-Instruct-2407-FP8

neuralmagic

Optimized 12.2B parameter Mistral model quantized to FP8, offering 50% memory reduction while maintaining 99.53% performance of original model.

PropertyValue
Parameter Count12.2B
LicenseApache 2.0
Tensor TypeBF16/F8_E4M3
OpenLLM Score71.28

What is Mistral-Nemo-Instruct-2407-FP8?

Mistral-Nemo-Instruct-2407-FP8 is an optimized version of the original Mistral-Nemo-Instruct model, specifically designed for efficient deployment while maintaining high performance. Through FP8 quantization, it achieves approximately 50% reduction in disk size and GPU memory requirements compared to the original model, while preserving 99.53% of its performance.

Implementation Details

The model employs sophisticated optimization techniques, particularly in its quantization approach. It uses symmetric per-tensor quantization for both weights and activations of linear operators within transformer blocks, implementing the FP8 data type through the AutoFP8 framework with calibration on 512 sequences of UltraChat.

  • Weight and activation quantization to FP8
  • Compatible with vLLM >= 0.5.0
  • 4096 token context window
  • Optimized for commercial and research applications

Core Capabilities

  • Achieves 71.28 average score on OpenLLM benchmark
  • Excels in various tasks: MMLU (68.50%), GSM-8K (73.01%), Hellaswag (84.18%)
  • Supports efficient deployment through vLLM backend
  • Specialized for English language tasks and assistant-like chat applications

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its efficient FP8 quantization that reduces resource requirements by 50% while maintaining over 99% of the original model's performance, making it particularly suitable for production deployment.

Q: What are the recommended use cases?

The model is optimized for English language applications, particularly in commercial and research contexts requiring assistant-like chat functionality. It's specifically designed for deployment scenarios where resource efficiency is crucial without compromising performance.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026