nemo-megatron-gpt-1.3B

Maintained By
nvidia

NeMo Megatron-GPT 1.3B

PropertyValue
Parameter Count1.3 Billion
ArchitectureTransformer Decoder
Training DataThe Pile Dataset
LicenseCC-BY-4.0
FrameworkPyTorch/NeMo

What is nemo-megatron-gpt-1.3B?

NeMo Megatron-GPT 1.3B is a transformer-based language model developed by NVIDIA using their NeMo framework. It's a decoder-only model similar to GPT-2 and GPT-3, containing 1.3 billion parameters and designed to run efficiently on a single NVIDIA GPU with Tensor Parallelism (TP) and Pipeline Parallelism (PP) of 1.

Implementation Details

The model is implemented using NVIDIA's NeMo toolkit and trained on The Pile dataset. It features a transformer decoder architecture optimized for text generation tasks and can be deployed using NVIDIA's evaluation server setup.

  • Zero-shot learning capabilities with strong performance across multiple benchmarks
  • Supports customizable generation parameters including temperature, top-k, and top-p sampling
  • Implements repetition penalty for improved output quality

Core Capabilities

  • Text generation with controllable parameters
  • Zero-shot performance on various NLP tasks (ARC, RACE, Winogrande, etc.)
  • Achieves notable scores: 68.34% on PiQA, 59.79% on BoolQA
  • Supports both interactive and batch processing modes

Frequently Asked Questions

Q: What makes this model unique?

The model combines the efficiency of NVIDIA's NeMo framework with the scale of 1.3B parameters, making it accessible for deployment on a single GPU while maintaining strong performance on various NLP tasks.

Q: What are the recommended use cases?

The model is well-suited for text generation tasks, zero-shot learning applications, and general language understanding tasks. However, users should be aware of potential biases inherited from internet-based training data.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.