GPT-2 From Scratch PyTorch
Property | Value |
---|---|
Author | rasbt |
Model Variants | Small (124M), Medium (355M), Large (774M), XL (1558M) |
Framework | PyTorch |
Repository | Hugging Face |
What is gpt2-from-scratch-pytorch?
This is a PyTorch implementation of OpenAI's GPT-2 model, featuring converted weights from the original TensorFlow implementation. The project provides a clean, from-scratch implementation with support for multiple model sizes, ranging from 124M to 1.5B parameters.
Implementation Details
The implementation includes a complete PyTorch architecture with configurable parameters for embedding dimensions, attention heads, and layer counts. The model uses the tiktoken tokenizer and supports both state dict and safetensors loading formats.
- Vocabulary size: 50,257 tokens
- Maximum context length: 1,024 tokens
- Configurable embedding dimensions (768-1600)
- Variable attention heads (12-25)
- Adjustable layer counts (12-48)
Core Capabilities
- Text generation with customizable maximum token output
- Support for multiple model sizes and configurations
- Compatible with both .pth and .safetensors formats
- Implements the original GPT-2 architecture with query-key-value attention
Frequently Asked Questions
Q: What makes this model unique?
This implementation provides a from-scratch PyTorch version of GPT-2, making it ideal for learning and understanding the architecture. It maintains compatibility with OpenAI's original weights while offering flexible loading options.
Q: What are the recommended use cases?
The model is particularly useful for educational purposes, text generation tasks, and as a foundation for further research. It's also valuable for those who want to understand GPT-2's architecture or need a PyTorch-native implementation.