Gemma Scope
Property | Value |
---|---|
License | cc-by-4.0 |
Paper | arXiv:2408.05147 |
Author |
What is gemma-scope?
Gemma Scope is an innovative suite of sparse autoencoders designed to analyze and understand the internal workings of Google's Gemma 2 language models. Acting as a microscope for AI, it helps researchers break down model activations into interpretable concepts across different model sizes (2B, 9B, and 27B parameters).
Implementation Details
The framework implements sparse autoencoders at various widths (from 2^14 to 2^20 neurons) and different architectural components including attention, MLP, and residual layers. Training is conducted on massive datasets ranging from 4B to 16B tokens, ensuring comprehensive coverage of model behaviors.
- Multiple model variants for 2.6B, 9B, and 27B parameter versions
- Comprehensive coverage of attention, MLP, and residual stream analysis
- Various SAE widths from ~16K to 1M neurons
- Support for both pretrained (PT) and instruction-tuned (IT) models
Core Capabilities
- Detailed analysis of internal model activations
- Interactive visualization through Neuronpedia demo
- Support for multiple model architectures and sizes
- Extensive documentation and tutorials available
- Integration with both PyTorch and JAX frameworks
Frequently Asked Questions
Q: What makes this model unique?
Gemma Scope is unique in providing an unprecedented level of transparency into large language model internals, offering researchers tools to understand how these models process and represent information at various levels of abstraction.
Q: What are the recommended use cases?
The primary use cases include AI safety research, model interpretability studies, and detailed analysis of how language models process and represent information internally. It's particularly valuable for researchers working on model transparency and safety alignment.