Nemotron-4-340B-Instruct
Property | Value |
---|---|
Parameter Count | 340B |
License | NVIDIA Open Model License |
Context Length | 4,096 tokens |
Training Data | 9 trillion tokens |
Paper | Technical Report |
What is Nemotron-4-340B-Instruct?
Nemotron-4-340B-Instruct is NVIDIA's advanced large language model specifically designed for synthetic data generation and chat applications. Built on a massive 340B parameter architecture, it has been trained on 9 trillion tokens spanning over 50 natural languages and 40+ coding languages, making it one of the most comprehensive multilingual models available.
Implementation Details
The model implements a decoder-only Transformer architecture with Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE). It underwent multiple alignment steps including Supervised Fine-tuning (SFT), Direct Preference Optimization (DPO), and NVIDIA's proprietary Reward-aware Preference Optimization (RPO).
- Hardware Requirements: 8x H200 or 16x H100/A100 80GB GPUs
- BF16 precision for optimal inference
- Supports both single-turn and multi-turn conversations
- Implements context window of 4,096 tokens
Core Capabilities
- Achieves 78.7% on MMLU (0-shot)
- 92.3% accuracy on GSM8K math problems
- 73.2% pass rate on HumanEval coding tasks
- Strong performance in multilingual tasks
- Advanced synthetic data generation capabilities
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its synthetic data generation capabilities, using only 20K human-annotated samples while generating 98% of its training data through an advanced pipeline. It also features NVIDIA's novel RPO alignment technique, making it particularly effective for instruction-following and chat applications.
Q: What are the recommended use cases?
The model excels in synthetic data generation for training other LLMs, English-language chat applications, mathematical reasoning, coding tasks, and instruction-following scenarios. It's particularly valuable for developers and enterprises looking to build and customize their own language models.