pythia-1b-deduped

Maintained By
EleutherAI

Pythia-1B-deduped

PropertyValue
Parameter Count1.08B parameters
Model TypeTransformer-based Language Model
Architecture16 layers, 2048 dimension, 8 attention heads
LicenseApache 2.0
PaperResearch Paper

What is pythia-1b-deduped?

Pythia-1B-deduped is part of EleutherAI's Pythia Scaling Suite, a collection of models specifically developed to facilitate interpretability research. This particular model contains 1.08B parameters and was trained on a deduplicated version of the Pile dataset, making it particularly valuable for scientific research on language model behavior and capabilities.

Implementation Details

The model utilizes a GPT-NeoX architecture with 16 layers, 2048 model dimension, and 8 attention heads. It was trained for 143,000 steps with a batch size of 2M tokens, using a learning rate that decays to 10% of its initial value. The model provides 154 intermediate checkpoints, enabling detailed analysis of its learning progression.

  • Training dataset: Deduplicated version of The Pile (825GiB)
  • Training tokens: 299,892,736,000
  • Checkpoint frequency: Every 2,097,152,000 tokens
  • Framework: PyTorch with Flash Attention

Core Capabilities

  • Next token prediction for English text generation
  • Research-focused applications in interpretability studies
  • Supports scientific experimentation on language model behavior
  • Compatible with Hugging Face Transformers library

Frequently Asked Questions

Q: What makes this model unique?

This model is part of a controlled experimental suite where all models are trained on identical data in the same order, making it invaluable for studying model scaling and behavior. The deduplication of training data also provides a cleaner learning environment for research purposes.

Q: What are the recommended use cases?

The model is primarily intended for research on language model behavior and interpretability studies. It's not designed for deployment in production environments or direct human-facing applications. Researchers can utilize its extensive checkpoint system for studying model evolution during training.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.