Nemotron-4-340B-Base

Maintained By
nvidia

Nemotron-4-340B-Base

PropertyValue
Parameter Count340 Billion
Model TypeTransformer Decoder
ArchitectureNemotron-4 with GQA and RoPE
LicenseNVIDIA Open Model License
Training Tokens9 Trillion
Context Length4,096 tokens

What is Nemotron-4-340B-Base?

Nemotron-4-340B-Base is NVIDIA's flagship large language model designed for synthetic data generation and diverse language processing. Trained on an extensive dataset of 9 trillion tokens, the model demonstrates exceptional capabilities across more than 50 natural languages and 40 programming languages. It represents a significant advancement in multi-modal language understanding, combining extensive pre-training with innovative continuous learning approaches.

Implementation Details

The model architecture leverages a Transformer Decoder framework with sophisticated features like Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE). It operates with a global batch size of 2304 and requires substantial computational resources - typically running on either 8x H200, 16x H100, or 16x A100 80GB GPUs distributed across multiple nodes.

  • Pre-trained on 8 trillion tokens initially
  • Additional 1 trillion tokens for continuous pre-training
  • Supports BF16 inference
  • Compatible with NVIDIA NeMo Framework

Core Capabilities

  • Multilingual support for 50+ natural languages
  • Code generation across 40+ programming languages
  • Strong performance in zero-shot learning (90.53% on HellaSwag)
  • Impressive code generation capabilities (57.3% pass@1 on HumanEval)
  • Advanced mathematical reasoning across multiple languages

Frequently Asked Questions

Q: What makes this model unique?

The model's massive scale (340B parameters), combined with its extensive training on diverse datasets and continuous pre-training approach, makes it particularly effective for synthetic data generation and multilingual applications. Its commercial usability under the NVIDIA Open Model License also sets it apart from many other large language models.

Q: What are the recommended use cases?

The model excels in synthetic data generation, multilingual text processing, and code generation tasks. It's particularly suitable for researchers and developers building their own LLMs, and can be customized using NVIDIA's NeMo Framework tools including Parameter-Efficient Fine-Tuning and Model Alignment techniques.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.