Gemma Scope

Property	Value
License	cc-by-4.0
Paper	arXiv:2408.05147
Author	Google

What is gemma-scope?

Gemma Scope is an innovative suite of sparse autoencoders designed to analyze and understand the internal workings of Google's Gemma 2 language models. Acting as a microscope for AI, it helps researchers break down model activations into interpretable concepts across different model sizes (2B, 9B, and 27B parameters).

Implementation Details

The framework implements sparse autoencoders at various widths (from 2^14 to 2^20 neurons) and different architectural components including attention, MLP, and residual layers. Training is conducted on massive datasets ranging from 4B to 16B tokens, ensuring comprehensive coverage of model behaviors.

Multiple model variants for 2.6B, 9B, and 27B parameter versions
Comprehensive coverage of attention, MLP, and residual stream analysis
Various SAE widths from ~16K to 1M neurons
Support for both pretrained (PT) and instruction-tuned (IT) models

Core Capabilities

Detailed analysis of internal model activations
Interactive visualization through Neuronpedia demo
Support for multiple model architectures and sizes
Extensive documentation and tutorials available
Integration with both PyTorch and JAX frameworks

Frequently Asked Questions

Q: What makes this model unique?

Gemma Scope is unique in providing an unprecedented level of transparency into large language model internals, offering researchers tools to understand how these models process and represent information at various levels of abstraction.

Q: What are the recommended use cases?

The primary use cases include AI safety research, model interpretability studies, and detailed analysis of how language models process and represent information internally. It's particularly valuable for researchers working on model transparency and safety alignment.

gemma-scope