avturchin comments on GenericModel’s Shortform

avturchin 9 Dec 2025 12:10 UTC
4 points
−5
Can be good as if many AIs come to superintelligence simultaneously, they are more likely cooperate and thus include many different sets of values - and it will be less likely that just one AI will take over the whole world for some weird value like Papercliper.
- GenericModel 9 Dec 2025 12:13 UTC
  4 points
  −1
  Parent
  I guess this depends on your belief about the likelihood of alignment by default and the difficulty of alignment. Seems like it (EDIT: by it, I mean alignment by default/alignment is very easy) could be true! I currently think it’s pretty hard to say definitively one way or the other, so am just highly unconfident.
  But how could this possibly be better than taking your time to make sure the first one has the highest odds possible of being aligned? Right? Like if I said
  “We’re going to create an incredibly powerful Empire, this Empire is going to rule the world”
  And you said “I hope that Empire has good values,”
  Then I say “Wait, actually, you’re right, I’ll make 100 different Empires so ~~they’re all in conflict~~ they cooperate with each other and at least one of them will have good values”
  I think you should basically look at me incredulously and say that obviously that’s ridiculous as an idea.
  - StanislavKrym 9 Dec 2025 14:32 UTC
    1 point
    0
    Parent
    Didn’t the AI-2027 forecast imply that it’s the race between the leaders that threatens to misalign the AIs and to destroy mankind? In the forecast itself, the two leading countries are the USA and China, which, alas, has a bit less compute than OpenBrain, the American leading company, alone. This causes the arms race between OpenBrain and the collective having all Chinese compute, DeepCent. As a result, DeepCent stubbornly refuses to implement the measures which would let the company have an aligned AI.
    Suppose that the probability that the AI created by a racing company is misaligned is p while^[1] the probablity P(the AI created by a perfectionist monopolist is misaligned) is q. Then P(Consensus-1 is misaligned|OpenBrain and DeepCent race) is p^2. The probability for a perfectionist to misalign its ASI is q. Then we would have to compare q with p^2, not with p.
    The authors of AI-2027 assume that p is close to 1 and q is close to zero.
    ^
    It might also depend on the compute budget of the perfectionist. Another complication is that p could also be different for the USA if, say, OpenAI, Anthropic, GDM, xAI decide to have the four AIs codesign the successor while DeepCent, having far less compute selects one model to do the research.
    - GenericModel 9 Dec 2025 14:37 UTC
      2 points
      0
      Parent
      Not sure how you get P(consensus-1 is misaligned | Race) = p^2 (EDIT: this was explained to me and I was just being silly). Maybe there’s an implicit assumption there I’m not seeing.
      
      But I agree that this is what AI-2027 roughly says. Also does seem that racing in itself causes bad outcomes.
      However, I feel like the “create a parliament of AIs that are all different and hope one of them is sorta aligned” is a different strategy based on very different assumptions though? (It is a strategy I do not agree with, but think the AI 2027 logic would be unlikely to convince a person who agreed with it, and this would be reasonable given such a person’s priors).