Simon Lermen comments on When does competition lead to recognisable values?

Simon Lermen 14 Jan 2026 16:27 UTC
2 points
0
The standard LessWrong/Yudkowsky-style story is: we develop an AI, it does recursive self-improvement, it becomes vastly more intelligent and smarter than all the other AIs, and then it gets all the power in the universe.
I think this is false. I hear this a lot, some version like Yud only ever imagined a singleton AI and never thought about the possibility that there might be multiple AIs. Ok, but then why did yudkowsky spend much of his research on decision theory? He explicitily envisioned how superintelligent AI systems could make deals with each other to solve prisoners dilemmas. My intuition is that perhaps he was looking for provably correct ways to lock multiple AIs in such dilemmas with both defecting on each other (and aiding humanity) or something in that direction.
He is on this paper for example about possible cooperation between algorithms: https://arxiv.org/pdf/1401.5577
- habryka 14 Jan 2026 17:34 UTC
  4 points
  2
  Parent
  If you have multiple AI systems they just coordinate and look to the humans as if they were acting as a single agent (much in the same way as from the perspective of a wild animal encroaching into human territory, the humans behave much like a single organism in terms of coordinating their response). The decision theory Eliezer worked on is helpful for understanding these kinds of things (because e.g. standard decision theory would inaccurately predict that even very smart systems would end up in defect-defect equilibria).