Pythia-70M-deduped
Property | Value |
---|---|
Parameter Count | 70.4M (18.9M non-embedding) |
Model Type | Transformer-based Language Model |
Architecture | 6 layers, 512 model dimension, 8 attention heads |
License | Apache 2.0 |
Paper | View Paper |
What is pythia-70m-deduped?
Pythia-70M-deduped is part of EleutherAI's Pythia Scaling Suite, a collection of models specifically developed for AI interpretability research. This particular model represents the smallest variant in the suite, trained on a deduplicated version of the Pile dataset. It features 70.4M parameters and is designed to provide researchers with a controlled environment for conducting scientific experiments on language model behavior.
Implementation Details
The model implements a GPT-NeoX architecture with 6 transformer layers, 512 dimensional embeddings, and 8 attention heads. It was trained with a batch size of 2M tokens and a learning rate of 1.0 x 10^-3. The training process involved 143,000 steps on the deduplicated Pile dataset, with 154 checkpoints available for research purposes.
- Trained on deduplicated version of the Pile dataset
- Available in FP16 and U8 tensor formats
- Implements causal language modeling architecture
- Provides extensive checkpoint availability for research
Core Capabilities
- Text generation and completion tasks
- Research-focused applications
- Interpretability studies
- Foundation for further fine-tuning
Frequently Asked Questions
Q: What makes this model unique?
This model is unique in being part of a carefully controlled scaling study, with identical training conditions across different model sizes and extensive checkpoint availability, making it ideal for interpretability research.
Q: What are the recommended use cases?
The model is primarily intended for research purposes, particularly in studying model behavior and interpretability. It is not recommended for production deployment or direct user-facing applications without additional fine-tuning and careful evaluation.