pythia-70m-deduped

Maintained By
EleutherAI

Pythia-70M-deduped

PropertyValue
Parameter Count70.4M (18.9M non-embedding)
Model TypeTransformer-based Language Model
Architecture6 layers, 512 model dimension, 8 attention heads
LicenseApache 2.0
PaperView Paper

What is pythia-70m-deduped?

Pythia-70M-deduped is part of EleutherAI's Pythia Scaling Suite, a collection of models specifically developed for AI interpretability research. This particular model represents the smallest variant in the suite, trained on a deduplicated version of the Pile dataset. It features 70.4M parameters and is designed to provide researchers with a controlled environment for conducting scientific experiments on language model behavior.

Implementation Details

The model implements a GPT-NeoX architecture with 6 transformer layers, 512 dimensional embeddings, and 8 attention heads. It was trained with a batch size of 2M tokens and a learning rate of 1.0 x 10^-3. The training process involved 143,000 steps on the deduplicated Pile dataset, with 154 checkpoints available for research purposes.

  • Trained on deduplicated version of the Pile dataset
  • Available in FP16 and U8 tensor formats
  • Implements causal language modeling architecture
  • Provides extensive checkpoint availability for research

Core Capabilities

  • Text generation and completion tasks
  • Research-focused applications
  • Interpretability studies
  • Foundation for further fine-tuning

Frequently Asked Questions

Q: What makes this model unique?

This model is unique in being part of a carefully controlled scaling study, with identical training conditions across different model sizes and extensive checkpoint availability, making it ideal for interpretability research.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, particularly in studying model behavior and interpretability. It is not recommended for production deployment or direct user-facing applications without additional fine-tuning and careful evaluation.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.