pythia-160m-deduped

Maintained By
EleutherAI

Pythia-160M-deduped

PropertyValue
Parameter Count162.3M (85.1M non-embedding)
Model TypeTransformer-based Language Model
Architecture12 layers, 768 dimension, 12 attention heads
LicenseApache 2.0
Training DataDeduplicated version of The Pile

What is pythia-160m-deduped?

Pythia-160M-deduped is part of the Pythia Scaling Suite, a collection of models specifically developed to facilitate interpretability research. This particular model represents the 160M parameter variant trained on a deduplicated version of The Pile dataset. It's architecturally equivalent to models like GPT-Neo 125M and OPT-125M, making it an excellent candidate for comparative research.

Implementation Details

The model features a sophisticated architecture with 12 transformer layers, a model dimension of 768, and 12 attention heads. It was trained with a batch size of 2M tokens and a learning rate of 6.0 x 10-4. The training process involved 143,000 steps, with checkpoints saved at regular intervals to enable research on model development across training.

  • Trained using the GPT-NeoX framework
  • Implements Flash Attention for improved efficiency
  • Provides 154 intermediate checkpoints for research purposes
  • Uses the same tokenizer as GPT-NeoX-20B

Core Capabilities

  • Next token prediction in English language text
  • Research-focused applications in model interpretability
  • Basis for fine-tuning in downstream tasks
  • Comparative studies with similar-sized models

Frequently Asked Questions

Q: What makes this model unique?

This model is part of a carefully controlled experimental suite where all models are trained on exactly the same data in the same order, making it invaluable for interpretability research. It also offers extensive checkpointing throughout the training process, allowing researchers to study model development.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, particularly in studying model behavior and interpretability. It's not recommended for deployment in production environments or direct human-facing applications without appropriate fine-tuning and risk assessment.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.