Information for iterated amplification

Basic information

Associated people: Paul Christiano, Buck Shlegeris, Dario Amodei

Associated organizations: OpenAI

Overview

Iterated amplification (also called iterated distillation and amplification) aims to build a powerful aligned AGI by repeatedly invoking two steps: (1) amplification and (2) distillation.

In the amplification step, a human uses multiple copies of an AGI assistant (who starts out not being very capable) to accomplish some task. The hope is that the combined system of the human with multiple copies of the AGI will be more capable than the human or AGI alone, since the human will be able to delegate tasks to the AGI. This is similar to how a CEO of a company can accomplish much more by hiring and delegating to employees, even if each employee is less capable than the CEO. The combined system of the human with multiple copies of the AGI is called the “amplified system”.
In the distillation step, an AI is trained using narrow methods (such as imitation learning) to replicate the behavior of the amplified system. The hope is that this “distilled” system will be just as capable as the amplified system while being much less computationally expensive. The distilled system is also supposed to remain aligned (because it was trained using narrow methods). In the next round of amplification/distillation, this distilled system becomes the new AGI assistant.

One specific version of iterated amplification has been called “imitating expert reasoning” in the reward modeling paper (see also this comment).

Goals of the agenda

Iterated amplification intends to build powerful AGI assistants that try to help humans.

The agenda does not intend to solve all problems, e.g. it doesn’t aim to solve philosophy (although to the extent that humans solving these problems, the AGI assistant would be able to help with them). See “Paul’s research agenda FAQ” § Goals and non-goals for more information.

Assumptions the agenda makes

AI timelines

No particular assumptions about AI timelines, as far as I know.

Nature of intelligence

Iterated amplification is intended to be able to deal with the case of prosaic AGI, i.e. the case where humanity is able to build AGI without learning anything fundamentally new about the nature of intelligence. In other words, iterated amplification works to align scaled-up versions of current machine learning systems.

Other

Iterated amplification has some “key hopes” it is based on:

If one has an overseer who is smarter than the agent being trained, then it is possible to use the overseer’s judgment as an objective to train the agent.
It is possible to train a reinforcement learning system using very sparse feedback (so it is fine for the overseer to be computationally expensive).
A team of aligned agents will be smarter/more capable than any individual agent while remaining aligned.

Documents

Title	Publication date	Author	Publisher	Affected organizations	Affected people	Affected agendas	Notes
Challenges to Christiano’s capability amplification proposal	2018-05-19	Eliezer Yudkowsky	Machine Intelligence Research Institute		Paul Christiano	Iterated amplification