Minitron-4B-Base

Maintained By
nvidia

Minitron-4B-Base

PropertyValue
DeveloperNVIDIA
Model Size4B parameters
LicenseNVIDIA Open Model License
Research PaperarXiv:2407.14679
Training PeriodFebruary 2024 - June 2024

What is Minitron-4B-Base?

Minitron-4B-Base is an innovative large language model created through a sophisticated pruning process of the larger Nemotron-4 15B model. What makes it particularly noteworthy is its ability to achieve comparable performance to larger models while requiring significantly fewer computational resources during training. The model demonstrates NVIDIA's commitment to creating more efficient AI systems without compromising on performance.

Implementation Details

The model architecture features a 3072 embedding size, 32 attention heads, and a 9216 MLP intermediate dimension. It implements advanced techniques including Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE) for enhanced performance.

  • Requires 40x fewer training tokens compared to training from scratch
  • Achieves 1.8x compute cost savings for the full model family
  • Performs comparably to larger models like Mistral 7B and Gemma 7B
  • Shows up to 16% improvement in MMLU scores compared to training from scratch

Core Capabilities

  • Strong performance on MMLU with 58.6% accuracy (5-shot)
  • Achieves 75.0% on HellaSwag and 74.0% on Winogrande (zero-shot)
  • Code generation capabilities with 23.3% pass@1 on HumanEval
  • Multilingual support including English text and code processing
  • Efficient inference using TensorRT-LLM on NVIDIA hardware

Frequently Asked Questions

Q: What makes this model unique?

The model's key distinction lies in its efficient pruning approach, which maintains performance while significantly reducing computational requirements. This is achieved through careful reduction of embedding size, attention heads, and MLP dimensions, followed by distillation training.

Q: What are the recommended use cases?

The model is designed for research and development purposes, excelling in tasks such as language understanding, code generation, and general text generation. It's particularly suitable for applications where computational efficiency is crucial while maintaining competitive performance.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.