There are so many causes or sources of AI risk that it’s getting hard to keep them all in mind. I propose we keep a list of the main sources (that we know about), such that we can say that if none of these things happen, then we’ve mostly eliminated AI risk (as an existential risk) at least as far as we can determine. Here’s a list that I spent a couple of hours enumerating and writing down. Did I miss anything important?
Doing acausal reasoning in a wrong way (e.g., failing to make good acausal trades, being acausally extorted, failing to acausally influence others who can be so influenced)
Human-controlled AIs causing ethical disasters (e.g., large scale suffering that can’t be “balanced out” later) prior to reaching moral/philosophical maturity
Treacherous turn / loss of property rights due to insufficient competitiveness of humans & human-aligned AIs
Gradual loss of influence due to insufficient competitiveness of humans & human-aligned AIs
Utility maximizers / goal-directed AIs having an economic and/or military competitive advantage due to relative ease of cooperation/coordination, defense against value corruption and other forms of manipulation and attack, leading to one or more of the above
In general, the most competitive type of AI being too hard to align or to safely use
(With this post I mean to (among other things) re-emphasize the disjunctive nature of AI risk, but this list isn’t fully disjunctive (i.e., some of the items are subcategories or causes of others), and I mostly gave a source of AI risk its own number in the list if it seemed important to make that source more salient. Maybe once we have a list of everything that is important, it would make sense to create a graph out of it.)
Added on 6/13/19:
Failure to learn how to deal with alignment in the many-humans, many-AIs case even if single-human, single-AI alignment is solved (suggested by William Saunders)
AI systems end up controlled by a group of humans representing a small range of human values (ie. an ideological or religious group that imposes values on everyone else) (suggested by William Saunders)
Added on 2/3/2020:
Failing to solve the commitment races problem, i.e. building AI in such a way that some sort of disastrous outcome occurs due to unwise premature commitments (or unwise hesitation in making commitments!). This overlaps significantly with #27, #19, and #12.
Vulnerable world type 1: narrow AI enables many people to destroy world, e.g. R&D tools that dramatically lower the cost for building WMD’s.
Vulnerable world 2a: We end up with many powerful actors able and incentivized to create civilization-devastating harms.
[Edit on 1/28/2020: This list was created by Wei Dai. Daniel Kokotajlo offered to keep it updated and prettify it over time, and so was added as a coauthor.]
The Main Sources of AI Risk?
There are so many causes or sources of AI risk that it’s getting hard to keep them all in mind. I propose we keep a list of the main sources (that we know about), such that we can say that if none of these things happen, then we’ve mostly eliminated AI risk (as an existential risk) at least as far as we can determine. Here’s a list that I spent a couple of hours enumerating and writing down. Did I miss anything important?
Insufficient time/resources for AI safety (for example caused by intelligence explosion or AI race)
Insufficient global coordination, leading to the above
Misspecified or incorrectly learned goals/values
Inner optimizers
ML differentially accelerating easy to measure goals
Paul Christiano’s “influence-seeking behavior” (a combination of 3 and 4 above?)
AI generally accelerating intellectual progress in a wrong direction (e.g., accelerating unsafe/risky technologies more than knowledge/wisdom about how to safely use those technologies)
Metaethical error
Metaphilosophical error
Other kinds of philosophical errors in AI design (e.g., giving AI a wrong prior or decision theory)
Other design/coding errors (e.g., accidentally putting a minus sign in front of utility function, supposedly corrigible AI not actually being corrigible)
Doing acausal reasoning in a wrong way (e.g., failing to make good acausal trades, being acausally extorted, failing to acausally influence others who can be so influenced)
Human-controlled AIs ending up with wrong values due to insufficient “metaphilosophical paternalism”
Human-controlled AIs causing ethical disasters (e.g., large scale suffering that can’t be “balanced out” later) prior to reaching moral/philosophical maturity
Intentional corruption of human values
Unintentional corruption of human values
Mind crime (disvalue unintentionally incurred through morally relevant simulations in AIs’ minds)
Premature value lock-in (i.e., freezing one’s current conception of what’s good into a utility function)
Extortion between AIs leading to vast disvalue
Distributional shifts causing apparently safe/aligned AIs to stop being safe/aligned
Value drift and other kinds of error as AIs self-modify, or AIs failing to solve value alignment for more advanced AIs
Treacherous turn / loss of property rights due to insufficient competitiveness of humans & human-aligned AIs
Gradual loss of influence due to insufficient competitiveness of humans & human-aligned AIs
Utility maximizers / goal-directed AIs having an economic and/or military competitive advantage due to relative ease of cooperation/coordination, defense against value corruption and other forms of manipulation and attack, leading to one or more of the above
In general, the most competitive type of AI being too hard to align or to safely use
Computational resources being too cheap, leading to one or more of the above
(With this post I mean to (among other things) re-emphasize the disjunctive nature of AI risk, but this list isn’t fully disjunctive (i.e., some of the items are subcategories or causes of others), and I mostly gave a source of AI risk its own number in the list if it seemed important to make that source more salient. Maybe once we have a list of everything that is important, it would make sense to create a graph out of it.)
Added on 6/13/19:
Failure to learn how to deal with alignment in the many-humans, many-AIs case even if single-human, single-AI alignment is solved (suggested by William Saunders)
Economics of AGI causing concentration of power amongst human overseers
Inability to specify any ‘real-world’ goal for an artificial agent (suggested by Michael Cohen)
AI systems end up controlled by a group of humans representing a small range of human values (ie. an ideological or religious group that imposes values on everyone else) (suggested by William Saunders)
Added on 2/3/2020:
Failing to solve the commitment races problem, i.e. building AI in such a way that some sort of disastrous outcome occurs due to unwise premature commitments (or unwise hesitation in making commitments!). This overlaps significantly with #27, #19, and #12.
Added on 3/11/2020:
Demons in imperfect search (similar, but distinct from, inner optimizers.) See here for illustration.
Added on 10/4/2020:
Persuasion tools or some other form of narrow AI leads to a massive deterioration of collective epistemology, dooming humanity to stumble inexorably into some disastrous end or other.
Added on 8/31/2021:
Vulnerable world type 1: narrow AI enables many people to destroy world, e.g. R&D tools that dramatically lower the cost for building WMD’s.
Vulnerable world 2a: We end up with many powerful actors able and incentivized to create civilization-devastating harms.
[Edit on 1/28/2020: This list was created by Wei Dai. Daniel Kokotajlo offered to keep it updated and prettify it over time, and so was added as a coauthor.]