John Schulman

Co-founder of OpenAI, co-inventor of PPO, joined Anthropic in 2024 and later moved to Mira Murati's Thinking Machines Lab.

Who is John Schulman?

John Schulman is an OpenAI co-founder best known as a co-inventor of Proximal Policy Optimization (PPO), one of the most widely used reinforcement learning methods in modern AI. In 2024, he left OpenAI for Anthropic, and later moved to Mira Murati’s Thinking Machines Lab. (openai.com)

Background and career

Schulman was part of OpenAI’s founding team, which the company listed in its original introduction. His early work at OpenAI helped shape reinforcement learning and alignment research, including methods for training models from human feedback. (openai.com)

He is most closely associated with PPO, released by OpenAI in 2017, and with later work that helped make instruction tuning and alignment training more practical for large language models. In 2024, he left OpenAI for Anthropic, saying he wanted to deepen his focus on AI alignment, and Thinking Machines Lab later listed him as Chief Scientist. (openai.com)

Key facts about John Schulman include:

Current affiliation: Thinking Machines Lab lists him as Chief Scientist. (thinkingmachines.ai)
OpenAI role: He was a founding member of OpenAI. (openai.com)
Signature contribution: He co-authored PPO, which OpenAI described as a simpler, strong-performing RL algorithm. (openai.com)
Alignment work: He was involved in OpenAI’s human-feedback and alignment research. (openai.com)
2024 move: He left OpenAI for Anthropic in 2024. (techcrunch.com)

Notable contributions

OpenAI founding team: Schulman was one of the original OpenAI co-founders. (openai.com)
Proximal Policy Optimization: He co-invented PPO, a landmark RL algorithm still used as a default training method in many settings. (openai.com)
Learning from human preferences: He contributed to OpenAI’s early preference-based alignment work that influenced RLHF-style training. (openai.com)
Alignment research leadership: OpenAI credited him with advancing alignment research during his tenure. (techcrunch.com)
Thinking Machines Lab research: He later joined Thinking Machines Lab, where public materials position the lab around customizable, collaborative AI systems. (thinkingmachines.ai)

Why they matter in AI today

PPO still shapes post-training: Many builders learn PPO because it remains a core reference point for RL-based model tuning. (openai.com)
Alignment is mainstream: Schulman’s career tracks the shift from raw model training to safer, preference-aware model behavior. (openai.com)
Useful for agentic systems: His work matters to teams building agents that must learn from feedback, rewards, and evaluation loops. (openai.com)
Research-to-product bridge: His path shows how research ideas become production training recipes used across frontier labs. (openai.com)
Post-training is now a discipline: Schulman’s career highlights why evaluation, iteration, and alignment are now central to shipping useful AI. (openai.com)

Where to follow their work

The best public place to track Schulman’s current work is Thinking Machines Lab, which lists him on its team page and publishes research updates on its site. (thinkingmachines.ai)

For historical context, OpenAI’s posts on PPO, human preferences, and alignment research capture the ideas most closely associated with his early work. (openai.com)

How PromptLayer connects with John Schulman's work

Schulman’s career is a useful reminder that strong AI systems depend on feedback loops, evaluation, and post-training discipline. PromptLayer helps teams manage prompts, inspect model behavior, and track the kinds of experiments and iterations that sit alongside PPO-style training and modern alignment workflows.

Ready to try it yourself? Sign up for PromptLayer and start managing your prompts in minutes.