Information for Iterated amplification

Basic information

Associated people: Paul Christiano, Buck Shlegeris, Dario Amodei

Associated organizations: OpenAI

Overview

Iterated amplification (also called iterated distillation and amplification) aims to build a powerful aligned AGI by repeatedly invoking two steps: (1) amplification and (2) distillation.

  1. In the amplification step, a human uses multiple copies of an AGI assistant (who starts out not being very capable) to accomplish some task. The hope is that the combined system of the human with multiple copies of the AGI will be more capable than the human or AGI alone, since the human will be able to delegate tasks to the AGI. This is similar to how a CEO of a company can accomplish much more by hiring and delegating to employees, even if each employee is less capable than the CEO. The combined system of the human with multiple copies of the AGI is called the “amplified system”.
  2. In the distillation step, an AI is trained using narrow methods (such as imitation learning) to replicate the behavior of the amplified system. The hope is that this “distilled” system will be just as capable as the amplified system while being much less computationally expensive. The distilled system is also supposed to remain aligned (because it was trained using narrow methods). In the next round of amplification/distillation, this distilled system becomes the new AGI assistant.

One specific version of iterated amplification has been called “imitating expert reasoning” in the reward modeling paper (see also this comment).

Goals of the agenda

Iterated amplification intends to build powerful AGI assistants that try to help humans.

The agenda does not intend to solve all problems, e.g. it doesn’t aim to solve philosophy (although to the extent that humans solving these problems, the AGI assistant would be able to help with them). See “Paul’s research agenda FAQ” § Goals and non-goals for more information.

Assumptions the agenda makes

AI timelines

No particular assumptions about AI timelines, as far as I know.

Nature of intelligence

Iterated amplification is intended to be able to deal with the case of prosaic AGI, i.e. the case where humanity is able to build AGI without learning anything fundamentally new about the nature of intelligence. In other words, iterated amplification works to align scaled-up versions of current machine learning systems.

Other

Iterated amplification has some “key hopes” it is based on:

Documents

Title Publication date Author Publisher Affected organizations Affected people Notes
Challenges to Christiano’s capability amplification proposal 2018-05-19 Eliezer Yudkowsky Machine Intelligence Research Institute Paul Christiano This post was summarized in Alignment Newsletter #7 [1].
AI Alignment Podcast: An Overview of Technical AI Alignment with Rohin Shah (Part 1) 2019-04-11 Lucas Perry Future of Life Institute Rohin Shah Part one of an interview with Rohin Shah that goes covers some technical agendas for AI alignment.
AI Alignment Podcast: An Overview of Technical AI Alignment with Rohin Shah (Part 2) 2019-04-25 Lucas Perry Future of Life Institute Rohin Shah, Dylan Hadfield-Menell, Gillian Hadfield Part two of a podcast episode that goes into detail about some technical approaches to AI alignment.
Scalable agent alignment via reward modeling: a research direction 2018-11-19 Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg arXiv Google DeepMind This paper introduces the (recursive) reward modeling agenda, discussing its basic outline, challenges, and ways to overcome those challenges. The paper also discusses alternative agendas and their relation to reward modeling.
New safety research agenda: scalable agent alignment via reward modeling 2018-11-20 Victoria Krakovna LessWrong Google DeepMind Jan Leike Blog post on LessWrong announcing the recursive reward modeling agenda. Some comments in the discussion thread clarify various aspects of the agenda, including its relation to Paul Christiano’s iterated amplification agenda, whether the DeepMind safety team is thinking about the problem of whether the human user is a safe agent, and more details about alternating quantifiers in the analogy to complexity theory. Jan Leike is listed as an affected person for this document because he is the lead author and is mentioned in the blog post, and also because he responds to several questions raised in the comments.