Title | Publication date | Author | Publisher | Affected organizations | Affected people | Notes |
---|---|---|---|---|---|---|
AI Alignment Podcast: An Overview of Technical AI Alignment with Rohin Shah (Part 2) | 2019-04-25 | Lucas Perry | Future of Life Institute | Rohin Shah, Dylan Hadfield-Menell, Gillian Hadfield | Part two of a podcast episode that goes into detail about some technical approaches to AI alignment. | |
Scalable agent alignment via reward modeling: a research direction | 2018-11-19 | Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg | arXiv | Google DeepMind | This paper introduces the (recursive) reward modeling agenda, discussing its basic outline, challenges, and ways to overcome those challenges. The paper also discusses alternative agendas and their relation to reward modeling. |