Minitron-4B-Base

Property	Value
Developer	NVIDIA
Model Size	4B parameters
License	NVIDIA Open Model License
Research Paper	arXiv:2407.14679
Training Period	February 2024 - June 2024

What is Minitron-4B-Base?

Minitron-4B-Base is an innovative large language model created through a sophisticated pruning process of the larger Nemotron-4 15B model. What makes it particularly noteworthy is its ability to achieve comparable performance to larger models while requiring significantly fewer computational resources during training. The model demonstrates NVIDIA's commitment to creating more efficient AI systems without compromising on performance.

Implementation Details

The model architecture features a 3072 embedding size, 32 attention heads, and a 9216 MLP intermediate dimension. It implements advanced techniques including Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE) for enhanced performance.

Requires 40x fewer training tokens compared to training from scratch
Achieves 1.8x compute cost savings for the full model family
Performs comparably to larger models like Mistral 7B and Gemma 7B
Shows up to 16% improvement in MMLU scores compared to training from scratch

Core Capabilities

Strong performance on MMLU with 58.6% accuracy (5-shot)
Achieves 75.0% on HellaSwag and 74.0% on Winogrande (zero-shot)
Code generation capabilities with 23.3% pass@1 on HumanEval
Multilingual support including English text and code processing
Efficient inference using TensorRT-LLM on NVIDIA hardware

Frequently Asked Questions

Q: What makes this model unique?

The model's key distinction lies in its efficient pruning approach, which maintains performance while significantly reducing computational requirements. This is achieved through careful reduction of embedding size, attention heads, and MLP dimensions, followed by distillation training.

Q: What are the recommended use cases?

The model is designed for research and development purposes, excelling in tasks such as language understanding, code generation, and general text generation. It's particularly suitable for applications where computational efficiency is crucial while maintaining competitive performance.

Minitron-4B-Base

Minitron-4B-Base

What is Minitron-4B-Base?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models