Collection of discussions of key cruxes related to AI safety/alignment
These are works that highlight disagreements, cruxes, debates, assumptions, etc. about the importance of AI safety/alignment, about which risks are most likely, about which strategies to prioritise, etc.
I’ve also included some works that attempt to clearly lay out a particular view in a way that could be particularly helpful for others trying to see where the cruxes are, even if the work itself don’t spend much time addressing alternative views. I’m not sure precisely where to draw the boundaries in order to make this collection maximally useful.
These are ordered from most to least recent.
I’ve put in bold those works that very subjectively seem to me especially worth reading.
Collection of discussions of key cruxes related to AI safety/alignment
These are works that highlight disagreements, cruxes, debates, assumptions, etc. about the importance of AI safety/alignment, about which risks are most likely, about which strategies to prioritise, etc.
I’ve also included some works that attempt to clearly lay out a particular view in a way that could be particularly helpful for others trying to see where the cruxes are, even if the work itself don’t spend much time addressing alternative views. I’m not sure precisely where to draw the boundaries in order to make this collection maximally useful.
These are ordered from most to least recent.
I’ve put in bold those works that very subjectively seem to me especially worth reading.
General, or focused on technical work
Ben Garfinkel on scrutinising classic AI risk arguments − 80,000 Hours, 2020
Critical Review of ‘The Precipice’: A Reassessment of the Risks of AI and Pandemics—James Fodor, 2020; this received pushback from Rohin Shah, which resulted in a comment thread worth adding here in its own right
Fireside Chat: AI governance—Ben Garfinkel & Markus Anderljung, 2020
My personal cruxes for working on AI safety—Buck Shlegeris, 2020
What can the principal-agent literature tell us about AI risk? - Alexis Carlier & Tom Davidson, 2020
Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society—Carina Prunkl & Jess Whittlestone, 2020 (commentary here)
Interviews with Paul Christiano, Rohin Shah, Adam Gleave, and Robin Hanson—AI Impacts, 2019 (summaries and commentary here and here)
Brief summary of key disagreements in AI Risk—iarwain, 2019
A list of good heuristics that the case for AI x-risk fails—capybaralet, 2019
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More − 2019
Clarifying some key hypotheses in AI alignment—Ben Cottier & Rohin Shah, 2019
A shift in arguments for AI risk—Tom Sittler, 2019 (summary and discussion here)
The Main Sources of AI Risk? - Wei Dai & Daniel Kokotajlo, 2019
Current Work in AI Alignment—Paul Christiano, 2019 (key graph can be seen at 21:05)
What failure looks like—Paul Christiano, 2019 (critiques here and here; counter-critiques here; commentary here)
Disentangling arguments for the importance of AI safety—Richard Ngo, 2019
Reframing superintelligence—Eric Drexler, 2019 (I haven’t yet read this; maybe it should be in bold)
Prosaic AI alignment—Paul Christiano, 2018
How sure are we about this AI stuff? - Ben Garfinkel, 2018 (it’s been a while since I watched this; maybe it should be in bold)
AI Governance: A Research Agenda—Allan Dafoe, 2018
Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk”—Kaj Sotala, 2018 (full paper here)
A model I use when making plans to reduce AI x-risk—Ben Pace, 2018
Interview series on risks from AI—Alexander Kruel (XiXiDu), 2011 (or 2011 onwards?)
Focused on takeoff speed/discontinuity/FOOM specifically
Discontinuous progress in history: an update—Katja Grace, 2020 (also some more comments here)
My current framework for thinking about AGI timelines (and the subsequent posts in the series) - zhukeepa, 2020
What are the best arguments that AGI is on the horizon? - various authors, 2020
The AI Timelines Scam—jessicat, 2019 (I also recommend reading Scott Alexander’s comment there)
Double Cruxing the AI Foom debate—agilecaveman, 2018
Quick Nate/Eliezer comments on discontinuity − 2018
Arguments about fast takeoff—Paul Christiano, 2018
Likelihood of discontinuous progress around the development of AGI—AI Impacts, 2018
The Hanson-Yudkowsky AI-Foom Debate—various works from 2008-2013
Focused on governance/strategy work
My Updating Thoughts on AI policy—Ben Pace, 2020
Some cruxes on impactful alternatives to AI policy work—Richard Ngo, 2018
Somewhat less relevant
A small portion of the answers here − 2020
I intend to add to this list over time. If you know of other relevant work, please mention it in a comment.
I think you should add Clarifying some key hypotheses in AI alignment.
Ah yes, meant to add that but apparently missed it. Added now. Thanks!