Mistral-NeMo-Minitron-8B-Base

Property	Value
Parameter Count	8.41B
Model Type	Transformer Decoder (Auto-Regressive)
Architecture	Mistral-NeMo with GQA and RoPE
License	NVIDIA Open Model License
Paper	Technical Report

What is Mistral-NeMo-Minitron-8B-Base?

Mistral-NeMo-Minitron-8B-Base is an advanced language model developed by NVIDIA through innovative pruning and distillation of the larger Mistral-NeMo 12B model. Trained between July and August 2024, it represents a significant achievement in model efficiency, maintaining strong performance while reducing parameter count.

Implementation Details

The model features a sophisticated architecture with 4096 embedding dimensions, 32 attention heads, and 40 layers. It employs an MLP intermediate dimension of 11520 and utilizes advanced techniques like Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE). The model operates in BFloat16 precision and is optimized for inputs up to 8k characters.

Trained on 380 billion tokens
Supports multiple runtime engines including NeMo 24.05
Compatible with NVIDIA's latest GPU architectures
Implements efficient pruning and distillation techniques

Core Capabilities

Strong performance in language understanding (69.5 on MMLU)
Impressive zero-shot capabilities (83.0 on HellaSwag, 80.4 on Winogrande)
Robust code generation (43.77 score on MBPP)
Effective text generation and completion tasks

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient architecture achieved through systematic pruning and distillation, offering performance comparable to larger models while being more resource-efficient.

Q: What are the recommended use cases?

The model excels in various natural language generation tasks, including text completion, language understanding, and code generation. It's particularly suitable for applications requiring a balance between model size and performance.