Mistral-Nemo-Instruct-FP8-2407

Maintained By
mistralai

Mistral-Nemo-Instruct-FP8-2407

PropertyValue
LicenseApache 2.0
Context Window128k tokens
Model TypeInstruction-tuned LLM
Architecture40-layer Transformer with GQA
Model URLHuggingFace

What is Mistral-Nemo-Instruct-FP8-2407?

Mistral-Nemo-Instruct-FP8-2407 is a quantized instruction-tuned language model developed jointly by Mistral AI and NVIDIA. It represents a significant advancement in efficient LLM deployment, built upon the Mistral-Nemo-Base-2407 architecture. The model features impressive multilingual capabilities and serves as a drop-in replacement for Mistral 7B while delivering enhanced performance.

Implementation Details

The model employs a sophisticated transformer architecture with 40 layers, featuring a dimension of 5,120 and 32 attention heads (8 KV-heads for grouped-query attention). It utilizes SwiGLU activation and rotary embeddings with theta=1M. The model processes a vocabulary of approximately 128k tokens and supports a substantial 128k context window.

  • Multilingual and code-focused training data
  • FP8 quantization for efficient deployment
  • GQA (Grouped-Query Attention) implementation
  • 128k context window support

Core Capabilities

  • Strong performance on various benchmarks (83.5% on HellaSwag, 76.8% on Winogrande)
  • Multilingual proficiency across 8+ languages with MMLU scores ranging from 59-65%
  • Efficient processing through FP8 quantization
  • Compatible with vLLM library for deployment

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its combination of efficient quantization, extensive context window, and strong multilingual capabilities, all while maintaining competitive performance across various benchmarks. Its Apache 2.0 license also makes it accessible for commercial use.

Q: What are the recommended use cases?

The model is well-suited for multilingual applications, general text generation, and instruction-following tasks. Its large context window makes it particularly useful for processing lengthy documents, while its efficient quantization enables deployment in resource-constrained environments.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.