Information for Jan Leike

Basic information

Item	Value
Facebook username	100009882604264
Intelligent Agent Foundations Forum username	160
Donations List Website (data still preliminary)
Agendas	Recursive reward modeling

Organization	Title	Start date	End date	AI safety relation	Source	Notes
Australian National University					[1]
Machine Intelligence Research Institute	Research Advisor	2017-03-01	2018-05-01	position	[2], [3]
Future of Humanity Institute	Research Associate	2017-11-24	2024-04-16		[4], [5], [6], [7]
Machine Intelligence Research Institute	Spotlighted Advisor	2018-09-01	2018-09-02	position	[8], [9]
OpenAI	Executive	2021-01-01	2024-05-16		[10], [11]	head of alignment, superalignment lead, and executive

Name	Creation date	Description

Title	Publication date	Author	Publisher	Affected organizations	Affected people	Document scope	Cause area	Notes

Title	Publication date	Author	Publisher	Affected organizations	Affected people	Affected agendas	Notes
New safety research agenda: scalable agent alignment via reward modeling	2018-11-20	Victoria Krakovna	LessWrong	Google DeepMind	Jan Leike	Recursive reward modeling, iterated amplification	Blog post on LessWrong announcing the recursive reward modeling agenda. Some comments in the discussion thread clarify various aspects of the agenda, including its relation to Paul Christiano’s iterated amplification agenda, whether the DeepMind safety team is thinking about the problem of whether the human user is a safe agent, and more details about alternating quantifiers in the analogy to complexity theory. Jan Leike is listed as an affected person for this document because he is the lead author and is mentioned in the blog post, and also because he responds to several questions raised in the comments.
Scalable agent alignment via reward modeling: a research direction	2018-11-19	Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg	arXiv	Google DeepMind		Recursive reward modeling, Imitation learning, inverse reinforcement learning, Cooperative inverse reinforcement learning, myopic reinforcement learning, iterated amplification, debate	This paper introduces the (recursive) reward modeling agenda, discussing its basic outline, challenges, and ways to overcome those challenges. The paper also discusses alternative agendas and their relation to reward modeling.

Showing at most 20 people who are most similar in terms of which organizations they have worked at.

Person	Number of organizations in common	List of organizations in common
Paul Christiano	3	Future of Humanity Institute, Machine Intelligence Research Institute, OpenAI
Ryan Carey	3	Future of Humanity Institute, Machine Intelligence Research Institute, OpenAI
Robin Hanson	2	Future of Humanity Institute, Machine Intelligence Research Institute
Miles Brundage	2	Future of Humanity Institute, OpenAI
Nick Bostrom	2	Future of Humanity Institute, Machine Intelligence Research Institute
Katja Grace	2	Future of Humanity Institute, Machine Intelligence Research Institute
Stuart Armstrong	2	Future of Humanity Institute, Machine Intelligence Research Institute
Helen Toner	2	Future of Humanity Institute, OpenAI
Carl Shulman	2	Future of Humanity Institute, Machine Intelligence Research Institute
Daniel Dewey	2	Future of Humanity Institute, Machine Intelligence Research Institute
Girish Sastry	2	Future of Humanity Institute, OpenAI
Benjamin Mann	2	Machine Intelligence Research Institute, OpenAI
Jeremy Schlatter	2	Machine Intelligence Research Institute, OpenAI
Jarryd Martin	1	Australian National University
Marcus Hutter	1	Australian National University
Tom Everitt	1	Australian National University
Elliot Catt	1	Australian National University
Alan Hájek	1	Australian National University
Gary Lea	1	Australian National University
Allan Dafoe	1	Future of Humanity Institute