Nemotron-4-340B-Reward
Property | Value |
---|---|
Parameters | 340B |
Architecture | Transformer Decoder |
License | NVIDIA Open Model License |
Context Length | 4,096 tokens |
Paper | HelpSteer2 Paper |
What is Nemotron-4-340B-Reward?
Nemotron-4-340B-Reward is a sophisticated multi-dimensional reward model developed by NVIDIA for evaluating AI-generated responses. Built upon the Nemotron-4-340B-Base model, it incorporates a linear layer that transforms the final layer representation into five distinct scalar values, each corresponding to different aspects of response quality defined in the HelpSteer2 framework.
Implementation Details
The model requires substantial computational resources for inference, specifically either 16x H100 (2x H100 Nodes) or 16x A100 (2x A100 80GB Nodes) with BF16 precision. It was trained for 2 epochs on the NVIDIA HelpSteer2 dataset, achieving impressive benchmark scores on RewardBench evaluations.
- Supports context length of 4,096 tokens
- Outputs 9 float values (5 primary attributes plus additional metrics)
- Trained between December 2023 and May 2024
- Uses data with cutoff of June 2023
Core Capabilities
- Evaluates Helpfulness: Overall response utility
- Assesses Correctness: Factual accuracy and completeness
- Measures Coherence: Expression clarity and consistency
- Rates Complexity: Required intellectual depth
- Gauges Verbosity: Detail level relative to prompt
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its ability to provide multi-dimensional evaluation of AI responses, offering a more nuanced assessment than traditional single-score reward models. It's particularly valuable for synthetic data generation and RLAIF (Reinforcement Learning from AI Feedback).
Q: What are the recommended use cases?
The model is ideal for alignment stage development, synthetic data generation, and as a reward-model-as-a-judge system. It's specifically designed for English language applications and can be integrated into training pipelines using the NeMo Aligner framework.