Llama-3.1-Minitron-4B-Width-Base

Llama-3.1-Minitron-4B-Width-Base

nvidia

A 4.51B parameter LLM derived from Llama-3.1-8B through width pruning, featuring 32 attention heads and 32 layers. Optimized for commercial use with NVIDIA hardware.

PropertyValue
Parameter Count4.51B
Model TypeTransformer Decoder (Auto-Regressive)
ArchitectureLlama-3.1 with GQA and RoPE
LicenseNVIDIA Open Model License
Research PaperTechnical Report
Training PeriodJuly 29, 2024 - Aug 3, 2024

What is Llama-3.1-Minitron-4B-Width-Base?

Llama-3.1-Minitron-4B-Width-Base is an innovative language model developed by NVIDIA through a sophisticated pruning process of the larger Llama-3.1-8B model. It represents a significant achievement in model efficiency, maintaining robust performance while reducing computational requirements through strategic dimension reduction in both embeddings and MLP layers.

Implementation Details

The model features a carefully optimized architecture with 3072 embedding dimensions, 32 attention heads, and a 9216-dimensional MLP intermediate layer across 32 transformer layers. It employs advanced techniques like Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE) for enhanced performance.

  • BFloat16 precision for optimal performance-efficiency balance
  • Supports input lengths up to 8k characters
  • Trained on 94 billion tokens using distillation techniques
  • Compatible with NVIDIA Ampere, Blackwell, Hopper, and Lovelace architectures

Core Capabilities

  • Achieves 60.5 on Massive Multitask Language Understanding (5-shot)
  • Strong zero-shot performance: 76.1 on HellaSwag, 73.5 on Winogrande
  • 41.2 score on GSM8K for mathematical reasoning
  • 32.0 score on MBPP for code generation tasks
  • Multilingual support with emphasis on English content

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient architecture achieved through systematic width pruning and knowledge distillation, making it particularly suitable for commercial applications while maintaining strong performance metrics.

Q: What are the recommended use cases?

The model excels in various natural language generation tasks, particularly within commercial environments. It's optimized for text generation, comprehension, and code-related tasks, with effective performance in both zero-shot and few-shot scenarios.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026