gpt2-from-scratch-pytorch

Maintained By
rasbt

GPT-2 From Scratch PyTorch

PropertyValue
Authorrasbt
Model VariantsSmall (124M), Medium (355M), Large (774M), XL (1558M)
FrameworkPyTorch
RepositoryHugging Face

What is gpt2-from-scratch-pytorch?

This is a PyTorch implementation of OpenAI's GPT-2 model, featuring converted weights from the original TensorFlow implementation. The project provides a clean, from-scratch implementation with support for multiple model sizes, ranging from 124M to 1.5B parameters.

Implementation Details

The implementation includes a complete PyTorch architecture with configurable parameters for embedding dimensions, attention heads, and layer counts. The model uses the tiktoken tokenizer and supports both state dict and safetensors loading formats.

  • Vocabulary size: 50,257 tokens
  • Maximum context length: 1,024 tokens
  • Configurable embedding dimensions (768-1600)
  • Variable attention heads (12-25)
  • Adjustable layer counts (12-48)

Core Capabilities

  • Text generation with customizable maximum token output
  • Support for multiple model sizes and configurations
  • Compatible with both .pth and .safetensors formats
  • Implements the original GPT-2 architecture with query-key-value attention

Frequently Asked Questions

Q: What makes this model unique?

This implementation provides a from-scratch PyTorch version of GPT-2, making it ideal for learning and understanding the architecture. It maintains compatibility with OpenAI's original weights while offering flexible loading options.

Q: What are the recommended use cases?

The model is particularly useful for educational purposes, text generation tasks, and as a foundation for further research. It's also valuable for those who want to understand GPT-2's architecture or need a PyTorch-native implementation.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.