Llama-3.1-Minitron-4B-Width-Base

Property	Value
Parameter Count	4.51B
Model Type	Transformer Decoder (Auto-Regressive)
Architecture	Llama-3.1 with GQA and RoPE
License	NVIDIA Open Model License
Research Paper	Technical Report
Training Period	July 29, 2024 - Aug 3, 2024

What is Llama-3.1-Minitron-4B-Width-Base?

Llama-3.1-Minitron-4B-Width-Base is an innovative language model developed by NVIDIA through a sophisticated pruning process of the larger Llama-3.1-8B model. It represents a significant achievement in model efficiency, maintaining robust performance while reducing computational requirements through strategic dimension reduction in both embeddings and MLP layers.

Implementation Details

The model features a carefully optimized architecture with 3072 embedding dimensions, 32 attention heads, and a 9216-dimensional MLP intermediate layer across 32 transformer layers. It employs advanced techniques like Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE) for enhanced performance.

BFloat16 precision for optimal performance-efficiency balance
Supports input lengths up to 8k characters
Trained on 94 billion tokens using distillation techniques
Compatible with NVIDIA Ampere, Blackwell, Hopper, and Lovelace architectures

Core Capabilities

Achieves 60.5 on Massive Multitask Language Understanding (5-shot)
Strong zero-shot performance: 76.1 on HellaSwag, 73.5 on Winogrande
41.2 score on GSM8K for mathematical reasoning
32.0 score on MBPP for code generation tasks
Multilingual support with emphasis on English content

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient architecture achieved through systematic width pruning and knowledge distillation, making it particularly suitable for commercial applications while maintaining strong performance metrics.

Q: What are the recommended use cases?

The model excels in various natural language generation tasks, particularly within commercial environments. It's optimized for text generation, comprehension, and code-related tasks, with effective performance in both zero-shot and few-shot scenarios.