Seth Herd comments on Canaletto’s Shortform

Seth Herd 6 Aug 2025 4:18 UTC
2 points
0
Right, thanks! That model of a governor and some tenants undergoing near-perfect surveillance is my only model for a stable long-term future. And sure this can happen at some level above human but below superintelligence.

I was just a little thrown by the multiple superintelligences. It has in the past seemed unlikely to me that we’d wind up with a long-term agreement among different superintelligences vs. one taking over by force. But I can’t be sure!

It’s seemed unlikely to me since the arguments for cooperation among superintelligences don’t seem strong. Reading each others source code for perfect trustworthiness seems impossible in a learning network-based ASI. And timeless decision theory being so good that all superintelligences would necessarily follow it also seems implausible.

But I haven’t thought through the game theory plus models of likely superintelligence alignment/goals well enough to be confident, so for all I know cooperating superintelligences is a likely outcome.
- Vladimir_Nesov 6 Aug 2025 15:51 UTC
  5 points
  0
  Parent
  The relevance of this setup for “distribution of rapidly and unevenly expanding unregulated power” is that sufficiently strong AGIs might self-regulate in this way at some point even if not externally regulated, turning weaker AGIs (and humanity) harmless without even necessarily restricting their freedoms, including the freedom to chaotically proliferate more AGIs, other than in security-relevant and resource-relevant ways.
  
  Coordination among AGIs or ASIs needn’t be any more mysterious than coordination among people or nations. Knowledge of the source code is just a technical assumption that makes the math of toy examples go through. But plausibly for example machine learning itself gathers similarly useful data about the world, as we would want to be presented in the form of knowledge of the source code in such toy examples.
  
  So if the AGIs/ASIs merely learn models of each other, in the same way they would learn models of the world, they might be in a qualitatively similar decision theoretic position to actually knowing the source code. And there is no need to posit purely acausal coordination, they can talk to each other and set up incentives. Also, in the case of AGIs with internals similar to modern chatbots, there is no source code in a meaningful sense, they are mostly models, and so the framing of understanding and knowing them that takes the form of other models is much more natural than knowing them in the form of some “source code”.
  - Seth Herd 6 Aug 2025 20:15 UTC
    2 points
    0
    Parent
    Coordination among people isn’t mysterious, but it’s based in large part on properties that AGIs won’t have. That’s why I find hopes of stable collaborations optimistic in the absence of careful analysis of how they could be enforced or otherwise create lasting trust.
    
    Humans collaborate in large part because:
    
    We can’t do it all ourselves. AGIs will be able to expand their capabilities and fork as many copies as they have the hardware to run
    We like making friends (earning positive social regard) for its own sake This will only be true of AGIs if we mostly solve alignment, or get quite lucky
    
    So I’m not saying AGIs couldn’t cooperate, just that it shouldn’t be assumed that they can/will.
    
    In the absence of those properties, they’d need to worry a lot about scheming while striking deals. If the alignment problem wasn’t clearly solved in legible (to them) ways, they don’t know if their collaborators will turn traitor when the time is right. Just like humans, except everyone might be (probably is) a sociopath who can multiply and grow without limit.
    
    Incentives only work as long as there’s the hard constraints of the situation prevent a collaborator from slipping out of them.