Seth Herd comments on Canaletto’s Shortform

Seth Herd 5 Aug 2025 21:25 UTC
2 points
0
I’m unclear on whether you’re saying that there would be a stable equilibrium among ASIs or whether there would be a singleton governing everything and allowing wide lattitude of action.

A single AGIs can achieve anything if it can self-improve to create ASI without anyone knowing. Working underground or elsewhere in the solar system seems hard to detect and prevent once we have the robotics to seed such an effort- which won’t take long.

I did read and greatly enjoyed your linked post. I do think that’s a plausible and underdeveloped area of thought. I don’t find it all that likely for complex reasons, but it’s definitely worth more thoughtl. I didn’t get around to commenting on it; maybe I’ll go do that to put that discussion in a better place.
- Vladimir_Nesov 5 Aug 2025 21:53 UTC
  4 points
  0
  Parent
  Superintelligent governance serves as an anchor for the argument about mere AGIs I’m putting forward. I’m not distinguishing singleton vs. many coordinated ASIs, that shouldn’t matter for the effectiveness of managing their tenants. The stable situation is where every Earth-originating intelligent entity in the universe must be either one of the governing superintelligences, or a tenant of one, and a tenant can’t do anything too disruptive without their host’s permission. So like with countries and residency, but total surveillance for anything potentially relevant and in principle lack of the bad things that would go along with total surveillance in a human government. Not getting the bad things seems likely for the factory-farming disanalogy reasons: tenants are not instrumentally useful anyway, so there is no point in doing things that would in particular end up having bad side effects for them.
  
  So the argument is that you don’t necessarily need superintelligence to make this happen, it could also work with sufficiently capable AGIs as the hosts. Even if merely human level AGIs are insufficient, there might be some intermediate level way below superintelligence that’s sufficient. Then, a single AGI can’t actually achieve anything in secret or self-improve to ASI, because there is no unsupervised hardware for it to run on, and on supervised hardware it’d be found out and prevented from doing that.
  - Seth Herd 6 Aug 2025 4:18 UTC
    2 points
    0
    Parent
    Right, thanks! That model of a governor and some tenants undergoing near-perfect surveillance is my only model for a stable long-term future. And sure this can happen at some level above human but below superintelligence.
    
    I was just a little thrown by the multiple superintelligences. It has in the past seemed unlikely to me that we’d wind up with a long-term agreement among different superintelligences vs. one taking over by force. But I can’t be sure!
    
    It’s seemed unlikely to me since the arguments for cooperation among superintelligences don’t seem strong. Reading each others source code for perfect trustworthiness seems impossible in a learning network-based ASI. And timeless decision theory being so good that all superintelligences would necessarily follow it also seems implausible.
    
    But I haven’t thought through the game theory plus models of likely superintelligence alignment/goals well enough to be confident, so for all I know cooperating superintelligences is a likely outcome.
    - Vladimir_Nesov 6 Aug 2025 15:51 UTC
      5 points
      0
      Parent
      The relevance of this setup for “distribution of rapidly and unevenly expanding unregulated power” is that sufficiently strong AGIs might self-regulate in this way at some point even if not externally regulated, turning weaker AGIs (and humanity) harmless without even necessarily restricting their freedoms, including the freedom to chaotically proliferate more AGIs, other than in security-relevant and resource-relevant ways.
      
      Coordination among AGIs or ASIs needn’t be any more mysterious than coordination among people or nations. Knowledge of the source code is just a technical assumption that makes the math of toy examples go through. But plausibly for example machine learning itself gathers similarly useful data about the world, as we would want to be presented in the form of knowledge of the source code in such toy examples.
      
      So if the AGIs/ASIs merely learn models of each other, in the same way they would learn models of the world, they might be in a qualitatively similar decision theoretic position to actually knowing the source code. And there is no need to posit purely acausal coordination, they can talk to each other and set up incentives. Also, in the case of AGIs with internals similar to modern chatbots, there is no source code in a meaningful sense, they are mostly models, and so the framing of understanding and knowing them that takes the form of other models is much more natural than knowing them in the form of some “source code”.
      - Seth Herd 6 Aug 2025 20:15 UTC
        2 points
        0
        Parent
        Coordination among people isn’t mysterious, but it’s based in large part on properties that AGIs won’t have. That’s why I find hopes of stable collaborations optimistic in the absence of careful analysis of how they could be enforced or otherwise create lasting trust.
        
        Humans collaborate in large part because:
        
        We can’t do it all ourselves. AGIs will be able to expand their capabilities and fork as many copies as they have the hardware to run
        We like making friends (earning positive social regard) for its own sake This will only be true of AGIs if we mostly solve alignment, or get quite lucky
        
        So I’m not saying AGIs couldn’t cooperate, just that it shouldn’t be assumed that they can/will.
        
        In the absence of those properties, they’d need to worry a lot about scheming while striking deals. If the alignment problem wasn’t clearly solved in legible (to them) ways, they don’t know if their collaborators will turn traitor when the time is right. Just like humans, except everyone might be (probably is) a sociopath who can multiply and grow without limit.
        
        Incentives only work as long as there’s the hard constraints of the situation prevent a collaborator from slipping out of them.