Martín Soto comments on Disempowerment spirals as a likely mechanism for existential catastrophe

Martín Soto 15 Apr 2025 13:46 UTC
2 points
0
Some thoughts skimming this post generated:
If a catastrophe happens, then either:
1. It happened so discontinuously that we couldn’t avoid it even with our concentrated effort
2. It happened slowly but for some reason we didn’t make a concentrated effort. This could be because:
  1. We didn’t notice it (e.g. intelligence explosion inside lab)
  2. We couldn’t coordinate a concentrated effort, even if we all individually would want it to exist (e.g. no way to ensure China isn’t racing faster)
  3. We didn’t act individually rationally (e.g. Trump doesn’t listen to advisors / Trump brainwashed by AI)
1 seems unlikelier by the day.
2a is mostly transparency inside labs (and less importantly into economic developments), which is important but at least some people are thinking about it.
There’s a lot to think through in 2b and 2c. It might be critical to ensure early takeoff improves them, rather than degrading them (missing any drastic action to the contrary) until late takeoff can land the final blow.
If we assume enough hierarchical power structures, the situation simplifies into “what 5 world leaders do”, and then it’s pretty clear you mostly want communication channels and trustless agreements for 2b, and improving national decision-making for 2c.
Maybe what I’d be most excited to see from the “systemic risk” crowd is detailed thinking and exemplification on how assuming enough hierarchical power structures is wrong (that is, the outcome depends strongly on things other than what those 5 world leaders do), what are the most x-risk-worrisome additional dynamics in that area, and how to intervene on them.
(Maybe all this is still too abstract, but it cleared my head)
- meriton 17 Apr 2025 11:41 UTC
  1 point
  0
  Parent
  Wait, but I thought 1 and 2a look the same from a first-person perspective. I mean, I don’t really notice the difference between something happening suddenly and something that’s been happening for a while — until the consequences become “significant” enough for me to notice. In hindsight, sure, one can find differences, but in the moment? Probably not?
  I mean, single-single alignment assumes that the operator (human) is happy with the goals their AI is pursuing — not necessarily* with the consequences of how pursuing those goals affects the world around them (especially in a world where other human+AI agents are also pursuing their own goals).
  And so, like someone pointed out in a comment above, we might mistake early stages of disempowerment — the kind that eventually leads to undesirable outcomes in the economy/society/etc. — for empowerment. Because from the individual human’s perspective, that is what it feels like.
  No?
  What am I missing here?
  *Unless we assume the AI somewhat “teaches” the human what goals they should want to pursue — from a very non-myopic perspective.