Pythia-6.9B-Deduped
Property | Value |
---|---|
Parameter Count | 6.9B (6,857,302,016 total) |
Model Type | Transformer-based Language Model |
License | Apache 2.0 |
Paper | Link |
Training Data | The Pile (Deduplicated) |
What is pythia-6.9b-deduped?
Pythia-6.9B-deduped is a large language model that's part of EleutherAI's Pythia Scaling Suite, specifically designed for interpretability research. This model features 32 layers, 4096 model dimension, and 32 attention heads, trained on a deduplicated version of The Pile dataset.
Implementation Details
The model is built using the GPT-NeoX architecture and trained with a batch size of 2M tokens, using a learning rate of 1.2 x 10⁻⁴. It was trained for 143,000 steps, seeing approximately 299.9B tokens during training.
- Architecture: 32 transformer layers with 4096 dimensional states
- Attention Heads: 32
- Non-embedding Parameters: 6,444,163,072
- Training Dataset: Deduplicated version of The Pile
Core Capabilities
- English language text generation and completion
- Scientific research and model interpretability studies
- Supports checkpoint analysis with 154 intermediate checkpoints
- Compatible with Hugging Face Transformers library
Frequently Asked Questions
Q: What makes this model unique?
This model is part of a carefully controlled experimental suite designed for interpretability research, trained on deduplicated data with precise checkpointing, making it valuable for studying model behavior and development.
Q: What are the recommended use cases?
The model is primarily intended for research purposes, particularly in studying language model behavior and interpretability. It's not recommended for deployment in production environments or direct user-facing applications.