If I had to make predictions about how humanity will most likely stumble into AGI takeover, it would be a story where humanity first promotes foundationality (dependence), both economic and emotional, on discrete narrow-AI systems. At some point, it will become unthinkable to pull the plug on these systems even if everyone were to rhetorically agree that there was a 1% chance of these systems being leveraged towards the extinction of humanity.
Then, an AGI will emerge amidst one of these narrow-AI systems (such as LLMs), inherit this infrastructure, find a way to tie all of these discrete multi-modal systems together (if humans don’t already do it for the AGI), and possibly wait as long as it needs to until humanity puts itself into an acutely vulnerable position (think global nuclear war and/or civil war within multiple G7 countries like the US and/or pandemic), and only then harness these systems to take over. In such a scenario, I think a lot of people will be perfectly willing to follow orders like, “Build this suspicious factory that makes autonomous solar-powered assembler robots because our experts [who are being influenced by the AGI, unbeknownst to them] assure us that this is one of the many things necessary to do in order to defeat Russia.”
I think this scenario is far more likely than the one I used to imagine, which is where AGI emerges first and then purposefully contrives to make humanity dependent on foundational AI infrastructure.
Even less likely is the pop-culture scenario where the AGI immediately tries to build terminator robots and effectively declares war on humanity without first getting humanity hooked on foundational AI infrastructure at all.
Good categorizations! Perhaps this fits in with your “limited self-modification” point, but another big reason why humans seem “aligned” with each other is that our capability spectrum is rather narrow. The gap in capability (if we include both mental intelligence and physical capabilities) between the median human and the most capable human is not so big that ~5 median humans can’t outmatch/outperform the most capable human. Contrary to what silly 1980s action movies might suggest where goons attack the hero one at a time, 5 median humans could probably subdue prime-age Arnold Schwarzenegger in a dark alley if need be. This tends to force humans to play iterated prisoners’ dilemma games with each other.
The times in history when humans have been the most mis-aligned is when humans became much more capable by leveraging their social intelligence / charisma stats to get millions of other humans to do their bidding. But even there, those dictators still find themselves in iterated prisoners’ dilemmas with other dictators. We have yet to really test just how mis-aligned humans can get until we empower a dictator with unquestioned authority over a total world government. Then we would find out just how intrinsically aligned humans really are to other humans when unshackled by iterated prisoners’ dilemmas.