I don’t know how to avoid ASI killing us. However, when I try to imagine worlds in which humanity isn’t immediately destroyed by ASI, humanity’s success can often be traced back to some bottleneck in the ASI’s capabilities.
For example, Eliezer’s list of lethalities point 35 argues that “Schemes for playing “different” AIs off against each other stop working if those AIs advance to the point of being able to coordinate via reasoning about (probability distributions over) each others’ code.” because “Any system of sufficiently intelligent agents can probably behave as a single agent, even if you imagine you’re playing them against each other.” Note that he says “probably” (boldface mine).
In a world there humanity wasn’t immediately destroyed by ASI, I find it plausible (let’s say 10%) that something like Arrow’s impossibility theorem exists for coordination. And that we were able to exploit that to successfully pit different AIs against each other.
Of course you may argue that “10% of worlds not immediately destroyed by ASI” is a tiny slice of probability space. And that even in those worlds, the ability to pit AIs against each other is not sufficient. And you may disagree that the scenario is plausible. However, I hope I explained why I believe the idea of exploiting ASI limitations is a step in the right direction.
I don’t know how to avoid ASI killing us. However, when I try to imagine worlds in which humanity isn’t immediately destroyed by ASI, humanity’s success can often be traced back to some bottleneck in the ASI’s capabilities.
For example, Eliezer’s list of lethalities point 35 argues that “Schemes for playing “different” AIs off against each other stop working if those AIs advance to the point of being able to coordinate via reasoning about (probability distributions over) each others’ code.” because “Any system of sufficiently intelligent agents can probably behave as a single agent, even if you imagine you’re playing them against each other.” Note that he says “probably” (boldface mine).
In a world there humanity wasn’t immediately destroyed by ASI, I find it plausible (let’s say 10%) that something like Arrow’s impossibility theorem exists for coordination. And that we were able to exploit that to successfully pit different AIs against each other.
Of course you may argue that “10% of worlds not immediately destroyed by ASI” is a tiny slice of probability space. And that even in those worlds, the ability to pit AIs against each other is not sufficient. And you may disagree that the scenario is plausible. However, I hope I explained why I believe the idea of exploiting ASI limitations is a step in the right direction.