[Question] Why should we expect AIs to coordinate well?

This is a common assumption for AI risk scenarios, but it doesn’t seem very justified to me.

https://​​www.lesswrong.com/​​posts/​​gYaKZeBbSL4y2RLP3/​​strategic-implications-of-ais-ability-to-coordinate-at-low says that AIs could merge their utility functions. But it seems increasingly plausible that AIs will not have explicit utility functions, so that doesn’t seem much better than saying humans could merge their utility functions.

Similarly, you might think AIs could read each other’s source code, but to me that sounds like saying humans might coordinate by looking at each other’s MRI scans. We would need much better interpretability for this to be at all feasible.

https://​​forum.effectivealtruism.org/​​posts/​​vGsRdWzwjrFgCXdMn/​​why-would-ai-aim-to-defeat-humanity says AIs might have similar aims because they are trained in similar ways to each other, but this is even more true of humans coordinating with humans.

You might be able to copy and run an AI to verify what it will do in various scenarios, but sandboxing AIs will be hard for AIs for the same reasons it will be hard for humans.

So: why will AIs be able to coordinate better with each other than with humans, and better than humans can coordinate with each other?