Associated people: Paul Christiano, Buck Shlegeris, Dario Amodei
Associated organizations: OpenAI
Iterated amplification (also called iterated distillation and amplification) aims to build a powerful aligned AGI by repeatedly invoking two steps: (1) amplification and (2) distillation.
One specific version of iterated amplification has been called “imitating expert reasoning” in the reward modeling paper (see also this comment).
Iterated amplification intends to build powerful AGI assistants that try to help humans.
The agenda does not intend to solve all problems, e.g. it doesn’t aim to solve philosophy (although to the extent that humans solving these problems, the AGI assistant would be able to help with them). See “Paul’s research agenda FAQ” § Goals and non-goals for more information.
No particular assumptions about AI timelines, as far as I know.
Iterated amplification is intended to be able to deal with the case of prosaic AGI, i.e. the case where humanity is able to build AGI without learning anything fundamentally new about the nature of intelligence. In other words, iterated amplification works to align scaled-up versions of current machine learning systems.
Iterated amplification has some “key hopes” it is based on:
Title | Publication date | Author | Publisher | Affected organizations | Affected people | Affected agendas | Notes |
---|---|---|---|---|---|---|---|
Challenges to Christiano’s capability amplification proposal | 2018-05-19 | Eliezer Yudkowsky | Machine Intelligence Research Institute | Paul Christiano | Iterated amplification | This post was summarized in Alignment Newsletter #7 [1]. | |
AI Alignment Podcast: An Overview of Technical AI Alignment with Rohin Shah (Part 1) | 2019-04-11 | Lucas Perry | Future of Life Institute | Rohin Shah | iterated amplification | Part one of an interview with Rohin Shah that goes covers some technical agendas for AI alignment. | |
AI Alignment Podcast: An Overview of Technical AI Alignment with Rohin Shah (Part 2) | 2019-04-25 | Lucas Perry | Future of Life Institute | Rohin Shah, Dylan Hadfield-Menell, Gillian Hadfield | Embedded agency, Cooperative inverse reinforcement learning, inverse reinforcement learning, deep reinforcement learning from human preferences, recursive reward modeling, iterated amplification | Part two of a podcast episode that goes into detail about some technical approaches to AI alignment. | |
Scalable agent alignment via reward modeling: a research direction | 2018-11-19 | Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg | arXiv | Google DeepMind | Recursive reward modeling, Imitation learning, inverse reinforcement learning, Cooperative inverse reinforcement learning, myopic reinforcement learning, iterated amplification, debate | This paper introduces the (recursive) reward modeling agenda, discussing its basic outline, challenges, and ways to overcome those challenges. The paper also discusses alternative agendas and their relation to reward modeling. | |
New safety research agenda: scalable agent alignment via reward modeling | 2018-11-20 | Victoria Krakovna | LessWrong | Google DeepMind | Jan Leike | Recursive reward modeling, iterated amplification | Blog post on LessWrong announcing the recursive reward modeling agenda. Some comments in the discussion thread clarify various aspects of the agenda, including its relation to Paul Christiano’s iterated amplification agenda, whether the DeepMind safety team is thinking about the problem of whether the human user is a safe agent, and more details about alternating quantifiers in the analogy to complexity theory. Jan Leike is listed as an affected person for this document because he is the lead author and is mentioned in the blog post, and also because he responds to several questions raised in the comments. |