Mistral-NeMo-Minitron-8B-Base
Property | Value |
---|---|
Parameter Count | 8.41B |
Model Type | Transformer Decoder (Auto-Regressive) |
Architecture | Mistral-NeMo with GQA and RoPE |
License | NVIDIA Open Model License |
Paper | Technical Report |
What is Mistral-NeMo-Minitron-8B-Base?
Mistral-NeMo-Minitron-8B-Base is an advanced language model developed by NVIDIA through innovative pruning and distillation of the larger Mistral-NeMo 12B model. Trained between July and August 2024, it represents a significant achievement in model efficiency, maintaining strong performance while reducing parameter count.
Implementation Details
The model features a sophisticated architecture with 4096 embedding dimensions, 32 attention heads, and 40 layers. It employs an MLP intermediate dimension of 11520 and utilizes advanced techniques like Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE). The model operates in BFloat16 precision and is optimized for inputs up to 8k characters.
- Trained on 380 billion tokens
- Supports multiple runtime engines including NeMo 24.05
- Compatible with NVIDIA's latest GPU architectures
- Implements efficient pruning and distillation techniques
Core Capabilities
- Strong performance in language understanding (69.5 on MMLU)
- Impressive zero-shot capabilities (83.0 on HellaSwag, 80.4 on Winogrande)
- Robust code generation (43.77 score on MBPP)
- Effective text generation and completion tasks
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient architecture achieved through systematic pruning and distillation, offering performance comparable to larger models while being more resource-efficient.
Q: What are the recommended use cases?
The model excels in various natural language generation tasks, including text completion, language understanding, and code generation. It's particularly suitable for applications requiring a balance between model size and performance.