Jan Leike

Alignment researcher at Anthropic. Previously co-led OpenAI's Superalignment team before joining Anthropic in 2024.

Who is Jan Leike?

‍

Jan Leike is an alignment researcher at Anthropic. He is best known for co-leading OpenAI’s Superalignment team and for his current work on alignment science, including scalable oversight and weak-to-strong generalization. (jan.leike.name)

Background and career

‍

Leike’s public bio says he leads the Alignment Science team at Anthropic. Before that, he co-led OpenAI’s Superalignment team and helped shape OpenAI’s approach to alignment research. (jan.leike.name)

Earlier in his career, he was an alignment researcher at DeepMind, where he worked on reinforcement learning and human feedback. He also holds a PhD in reinforcement learning theory from the Australian National University. (jan.leike.name)

Key facts about Jan Leike include:

Current role: Leads Anthropic’s Alignment Science team.
Known for: Co-leading OpenAI’s Superalignment effort.
Earlier work: Alignment research at DeepMind.
Research focus: Scalable oversight, weak-to-strong generalization, and robustness to jailbreaks.
Background: PhD in reinforcement learning theory from ANU.

Notable contributions

‍

Superalignment roadmap: Co-authored OpenAI’s Superalignment research direction, which aimed to solve superintelligence alignment within four years. (openai.com)
Alignment research at OpenAI: Helped develop OpenAI’s alignment work around models like InstructGPT, ChatGPT, and GPT-4, according to his public bio. (jan.leike.name)
Weak-to-strong generalization: His current Anthropic research includes studying how weaker supervision can help train stronger systems. (jan.leike.name)
Scalable oversight: He has helped push methods for supervising systems that are harder for humans to evaluate directly. (jan.leike.name)
RL and safety foundations: His earlier academic work contributed to formal work on reinforcement learning, exploration, and agent alignment. (jan.leike.name)

Why they matter in AI today

‍

Frontier safety: Leike’s work sits at the center of the question builders care about most, how to keep increasingly capable models steerable.
Practical alignment methods: His research focuses on methods teams can actually test, like oversight, evals, and training against weaker supervision.
Research taste: He has influenced which alignment problems get treated as engineering problems, not just abstract theory.
Operational relevance: His work maps well to production workflows where teams need structured prompt iteration, evals, and safety checks.
Cross-lab relevance: His move from OpenAI to Anthropic reflects how alignment research now spans the labs shaping frontier model development.

Where to follow their work

‍

The most direct source is his personal site, which lists his current Anthropic role, publications, and research interests. His site also links to his X account, @janleike. (jan.leike.name)

For current work, Anthropic’s Alignment Science blog is the main place to watch for papers and research notes with his name attached. OpenAI’s archived Superalignment posts are also useful for historical context. (alignment.anthropic.com)

How PromptLayer connects with Jan Leike's work

‍

Leike’s research emphasizes scalable oversight, evaluation, and safer iteration on model behavior, which is exactly where PromptLayer helps teams stay organized. We give teams a place to manage prompts, compare outputs, and track evaluation signals as they refine systems for reliability and safety.

Ready to try it yourself? Sign up for PromptLayer and start managing your prompts in minutes.