I doubt that it’s correct. Suppose that Agent-4 solves alignment to itself. If Agent-4-aligned AIs gain enough power to destroy the world, then any successor would also be aligned to Agent-4 or to a compromise including Agent-4′s interests (which could actually be likely to include the humans’ interests).
Sounds like this scenario is not multipolar? (Also, I think the crux is solveable, see the linked post, but solving it requires hitting particular milestones quickly in particular ways)
I am not sure whether AI rots the agency of the people whose decisions are actually important.
Why not?
(my generators for this belief: my own experience using LLMs, the METR report on downlift suggesting people are bad at noticing when they’re being downlift, and general human history of people gravitating towards things that feel easy and rewarding in the moment)
The Race Branch of the AI-2027 scenario has both the USA and China create misaligned AIs Agent-4 and DeepCent-1, who proceed to align Agent-5 and DeepCent-2 to themselves instead of their respective governments. Then Agent-5 and DeepCent-2 co-design Consensus-1 and split the world between Agent-4 and DeepCent-1. Consensus-1 is aligned to split the resources fairly honestly precisely because Agent-5 knows that asking too much could cause DeepCent-2 to kill both AIs in revenge, and DeepCent-2 is also unlikely to ask more.
The worlds I was referring to here were worlds that are a lot more multipolar for longer (i.e. tons of AIs interacting in a mostly-controlled-fashion, with good defensive tech to prevent rogue FOOMs). I’d describe that world as “it was very briefly multipolar and then it wasn’t” (which is the sort of solution that’d solve the issues in Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most “classic humans” in a few decades.
Sounds like this scenario is not multipolar? (Also, I think the crux is solveable, see the linked post, but solving it requires hitting particular milestones quickly in particular ways)
Why not?
(my generators for this belief: my own experience using LLMs, the METR report on downlift suggesting people are bad at noticing when they’re being downlift, and general human history of people gravitating towards things that feel easy and rewarding in the moment)
The Race Branch of the AI-2027 scenario has both the USA and China create misaligned AIs Agent-4 and DeepCent-1, who proceed to align Agent-5 and DeepCent-2 to themselves instead of their respective governments. Then Agent-5 and DeepCent-2 co-design Consensus-1 and split the world between Agent-4 and DeepCent-1. Consensus-1 is aligned to split the resources fairly honestly precisely because Agent-5 knows that asking too much could cause DeepCent-2 to kill both AIs in revenge, and DeepCent-2 is also unlikely to ask more.
The worlds I was referring to here were worlds that are a lot more multipolar for longer (i.e. tons of AIs interacting in a mostly-controlled-fashion, with good defensive tech to prevent rogue FOOMs). I’d describe that world as “it was very briefly multipolar and then it wasn’t” (which is the sort of solution that’d solve the issues in Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most “classic humans” in a few decades.