Minitron-8B-Base

Minitron-8B-Base

nvidia

Minitron-8B-Base is an 8B parameter LLM from NVIDIA, derived from Nemotron-4 15B through efficient pruning and distillation, achieving MMLU score of 64.5.

PropertyValue
Model Size8B parameters
DeveloperNVIDIA
LicenseNVIDIA Open Model License
Research PaperarXiv:2407.14679
Training PeriodFebruary 2024 - June 2024

What is Minitron-8B-Base?

Minitron-8B-Base is an innovative large language model developed by NVIDIA through a sophisticated pruning process of the larger Nemotron-4 15B model. What makes it particularly interesting is its efficient training approach, requiring 40x fewer training tokens compared to training from scratch, while maintaining competitive performance with models like Mistral 7B and Gemma 7B.

Implementation Details

The model features a sophisticated architecture with 4096 embedding size, 48 attention heads, and 16384 MLP intermediate dimension. It implements Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE) for enhanced performance.

  • Architecture: Transformer Decoder (auto-regressive language model)
  • Network Base: Nemotron-4
  • Training Data: 94 billion tokens
  • Input/Output: Text-based string format

Core Capabilities

  • MMLU Score: 64.5 (5-shot)
  • HellaSwag: 81.6 (zero-shot)
  • GSM8K: 54.2 (zero-shot)
  • Code Generation: 31.6 (HumanEval p@1, 0-shot)
  • Multilingual support including code generation capabilities

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its efficient training approach, achieving comparable performance to larger models while requiring significantly less computational resources. The pruning and distillation process results in 1.8x compute cost savings for the entire model family.

Q: What are the recommended use cases?

The model is designed for research and development purposes, excelling in tasks like language understanding, code generation, and general text generation. However, users should be aware of potential limitations regarding toxic content and societal biases.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026