How Might an Alignment Attractor Look like?

It is a generally accepted view in MIRI and MIRI-adjacent circles that the odds are high that an eventual self-improving AGI will not be nice to humans, to put it mildly. It is actually becoming a bit of a zeitgeist, gaining more and more widespread support. There are plenty of arguments supporting this view, including some observational data (like the infamous Boeing 737 MAX MCAS). It is easy to fall in with the prevailing wisdom in this particular bubble, given that really smart people give really persuasive reasons for it. So I would like to ask people to imagine alternatives: a state of the world where a self-improving general AI is naturally aligned with humanity.

Now, what does it mean to be aligned? We sort of have an intuitive understanding of this: a strawberry-picking robot should not rip off people’s noses by accident, an AI that finds humans no longer worth their attention would disengage and leave, a planetary defense mechanism should not try to exterminate humans even if they want it shut down, though it can plausibly resist getting shut down, by non-violent means. Where, given some slack instead of a relentless drive to optimize something at any price, it would choose actions that are human-compatible, rather than treating humanity like any other collection of atoms. Where it would help us feel better without wireheading those who don’t want to be wireheaded, and would be careful doing it to those who want to. Without manipulating our feeble easily hackable minds into believing or acting in a way that we would not consent to beforehand. There are plenty of other reasonably intuitive examples, as well.

Humans are not a great example of an animal-aligned intelligence, of course. Our influence on other lifeforms is so far a huge net-negative, with the diversity of life on Earth plummeting badly. On the other hand, there are plenty of examples of symbiotic relationships between organisms of widely different intelligence levels, so maybe that is a possibility. For example, maybe at some point in the AI development it will be a logical step for it to harness some capabilities that humans possess and form a cyborg of sorts, or a collective consciousness, or uploaded human minds…

It seems that, when one actually tries to think up potential non-catastrophic possibilities, the space of them is rather large, and it is not inconceivable, given how little we still know about human and non-human intelligence, that some of those possibilities are not that remote. There are plenty of fictional examples to draw inspiration from, The Culture being one of the most prominent. The Prime Intellect is another one, with a completely different bent.

So, if we were to imagine a world where there is a human-friendly attractor of sorts that a self-improving AI would settle into, how would that world look?