Pythia-70M-deduped

Property	Value
Parameter Count	70.4M (18.9M non-embedding)
Model Type	Transformer-based Language Model
Architecture	6 layers, 512 model dimension, 8 attention heads
License	Apache 2.0
Paper	View Paper

What is pythia-70m-deduped?

Pythia-70M-deduped is part of EleutherAI's Pythia Scaling Suite, a collection of models specifically developed for AI interpretability research. This particular model represents the smallest variant in the suite, trained on a deduplicated version of the Pile dataset. It features 70.4M parameters and is designed to provide researchers with a controlled environment for conducting scientific experiments on language model behavior.

Implementation Details

The model implements a GPT-NeoX architecture with 6 transformer layers, 512 dimensional embeddings, and 8 attention heads. It was trained with a batch size of 2M tokens and a learning rate of 1.0 x 10^-3. The training process involved 143,000 steps on the deduplicated Pile dataset, with 154 checkpoints available for research purposes.

Trained on deduplicated version of the Pile dataset
Available in FP16 and U8 tensor formats
Implements causal language modeling architecture
Provides extensive checkpoint availability for research

Core Capabilities

Text generation and completion tasks
Research-focused applications
Interpretability studies
Foundation for further fine-tuning

Frequently Asked Questions

Q: What makes this model unique?

This model is unique in being part of a carefully controlled scaling study, with identical training conditions across different model sizes and extensive checkpoint availability, making it ideal for interpretability research.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, particularly in studying model behavior and interpretability. It is not recommended for production deployment or direct user-facing applications without additional fine-tuning and careful evaluation.