Pythia-6.9B-Deduped

Property	Value
Parameter Count	6.9B (6,857,302,016 total)
Model Type	Transformer-based Language Model
License	Apache 2.0
Paper	Link
Training Data	The Pile (Deduplicated)

What is pythia-6.9b-deduped?

Pythia-6.9B-deduped is a large language model that's part of EleutherAI's Pythia Scaling Suite, specifically designed for interpretability research. This model features 32 layers, 4096 model dimension, and 32 attention heads, trained on a deduplicated version of The Pile dataset.

Implementation Details

The model is built using the GPT-NeoX architecture and trained with a batch size of 2M tokens, using a learning rate of 1.2 x 10⁻⁴. It was trained for 143,000 steps, seeing approximately 299.9B tokens during training.

Architecture: 32 transformer layers with 4096 dimensional states
Attention Heads: 32
Non-embedding Parameters: 6,444,163,072
Training Dataset: Deduplicated version of The Pile

Core Capabilities

English language text generation and completion
Scientific research and model interpretability studies
Supports checkpoint analysis with 154 intermediate checkpoints
Compatible with Hugging Face Transformers library

Frequently Asked Questions

Q: What makes this model unique?

This model is part of a carefully controlled experimental suite designed for interpretability research, trained on deduplicated data with precise checkpointing, making it valuable for studying model behavior and development.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, particularly in studying language model behavior and interpretability. It's not recommended for deployment in production environments or direct user-facing applications.