Pythia-70M
Property | Value |
---|---|
Parameter Count | 70.4M (18.9M non-embedding) |
Architecture | GPT-NeoX |
Training Data | The Pile |
License | Apache 2.0 |
Paper | View Paper |
What is pythia-70m?
Pythia-70M is the smallest model in EleutherAI's Pythia Scaling Suite, specifically designed for interpretability research and scientific investigation of language models. It features 6 layers, 512 model dimensions, and 8 attention heads, trained on The Pile dataset.
Implementation Details
The model implements a transformer-based architecture using the GPT-NeoX framework. It was trained for 143,000 steps with a batch size of 2M tokens, using a learning rate of 1.0 x 10^-3. The model maintains compatibility with the Hugging Face Transformers library and provides 154 intermediate checkpoints for research purposes.
- Trained on 299.89B tokens
- Uses Flash Attention
- Implements FP16 and U8 tensor types
- Includes comprehensive checkpoint system
Core Capabilities
- Next token prediction for English text
- Research-focused applications
- Interpretability studies
- Base model for fine-tuning experiments
Frequently Asked Questions
Q: What makes this model unique?
This model is part of a carefully controlled experimental suite designed for scientific research. It provides extensive intermediate checkpoints and maintains exact training conditions across all models in the suite, making it ideal for studying model behavior and development.
Q: What are the recommended use cases?
Pythia-70M is primarily intended for research purposes, particularly in studying model interpretability and behavior. It's not designed for deployment or production use cases, and should not be used for direct human-facing applications without appropriate fine-tuning and safety considerations.