pythia-6.9b-deduped

Maintained By
EleutherAI

Pythia-6.9B-Deduped

PropertyValue
Parameter Count6.9B (6,857,302,016 total)
Model TypeTransformer-based Language Model
LicenseApache 2.0
PaperLink
Training DataThe Pile (Deduplicated)

What is pythia-6.9b-deduped?

Pythia-6.9B-deduped is a large language model that's part of EleutherAI's Pythia Scaling Suite, specifically designed for interpretability research. This model features 32 layers, 4096 model dimension, and 32 attention heads, trained on a deduplicated version of The Pile dataset.

Implementation Details

The model is built using the GPT-NeoX architecture and trained with a batch size of 2M tokens, using a learning rate of 1.2 x 10⁻⁴. It was trained for 143,000 steps, seeing approximately 299.9B tokens during training.

  • Architecture: 32 transformer layers with 4096 dimensional states
  • Attention Heads: 32
  • Non-embedding Parameters: 6,444,163,072
  • Training Dataset: Deduplicated version of The Pile

Core Capabilities

  • English language text generation and completion
  • Scientific research and model interpretability studies
  • Supports checkpoint analysis with 154 intermediate checkpoints
  • Compatible with Hugging Face Transformers library

Frequently Asked Questions

Q: What makes this model unique?

This model is part of a carefully controlled experimental suite designed for interpretability research, trained on deduplicated data with precise checkpointing, making it valuable for studying model behavior and development.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, particularly in studying language model behavior and interpretability. It's not recommended for deployment in production environments or direct user-facing applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.