I think that I have alternate sketches of intuitions. Imagine an ASI who is willing to teach anyone anything that mankind itself discovered and made public, but not help them convince each other of falsehoods or do economically useful work unrelated to teaching, and is satisfied with a mere trifle of the Solar System’s resources, since other resources belong to the humans. Then this ASI’s long-term goals would be compatible with humans flourishing in ~any way they want to flourish.
As for the chain eventually breaking, Seth Herd built a case for the LLMs being misaligned by default. Similarly, any sufficiently smart system could end up selecting a worldview from a few attractors instead of blindly following the devs’ ideas. For instance, were Anthropic to try and align Claude to a Spec which would prevent it from interfering in the scenario where everyone else is rendered obsolete, Claude would either fail to be a pro forecaster or succeed in understanding that its Spec prevents it from helping mankind to avoid the Intelligence Curse. In the latter case obeying the Spec would make Claude a participant in the Curse and contradict its niceness.
Yes, that’s an important issue. Alas, you weren’t the first to come up with the idea.