Llama-3.1-Minitron-4B-Width-Base

Maintained By
nvidia

Llama-3.1-Minitron-4B-Width-Base

PropertyValue
Parameter Count4.51B
Model TypeTransformer Decoder (Auto-Regressive)
ArchitectureLlama-3.1 with GQA and RoPE
LicenseNVIDIA Open Model License
Research PaperTechnical Report
Training PeriodJuly 29, 2024 - Aug 3, 2024

What is Llama-3.1-Minitron-4B-Width-Base?

Llama-3.1-Minitron-4B-Width-Base is an innovative language model developed by NVIDIA through a sophisticated pruning process of the larger Llama-3.1-8B model. It represents a significant achievement in model efficiency, maintaining robust performance while reducing computational requirements through strategic dimension reduction in both embeddings and MLP layers.

Implementation Details

The model features a carefully optimized architecture with 3072 embedding dimensions, 32 attention heads, and a 9216-dimensional MLP intermediate layer across 32 transformer layers. It employs advanced techniques like Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE) for enhanced performance.

  • BFloat16 precision for optimal performance-efficiency balance
  • Supports input lengths up to 8k characters
  • Trained on 94 billion tokens using distillation techniques
  • Compatible with NVIDIA Ampere, Blackwell, Hopper, and Lovelace architectures

Core Capabilities

  • Achieves 60.5 on Massive Multitask Language Understanding (5-shot)
  • Strong zero-shot performance: 76.1 on HellaSwag, 73.5 on Winogrande
  • 41.2 score on GSM8K for mathematical reasoning
  • 32.0 score on MBPP for code generation tasks
  • Multilingual support with emphasis on English content

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient architecture achieved through systematic width pruning and knowledge distillation, making it particularly suitable for commercial applications while maintaining strong performance metrics.

Q: What are the recommended use cases?

The model excels in various natural language generation tasks, particularly within commercial environments. It's optimized for text generation, comprehension, and code-related tasks, with effective performance in both zero-shot and few-shot scenarios.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.