Not sure how you get P(consensus-1 is misaligned | Race) = p^2 (EDIT: this was explained to me and I was just being silly). Maybe there’s an implicit assumption there I’m not seeing.
But I agree that this is what AI-2027 roughly says. Also does seem that racing in itself causes bad outcomes.
However, I feel like the “create a parliament of AIs that are all different and hope one of them is sorta aligned” is a different strategy based on very different assumptions though? (It is a strategy I do not agree with, but think the AI 2027 logic would be unlikely to convince a person who agreed with it, and this would be reasonable given such a person’s priors).
Not sure how you get P(consensus-1 is misaligned | Race) = p^2 (EDIT: this was explained to me and I was just being silly). Maybe there’s an implicit assumption there I’m not seeing.
But I agree that this is what AI-2027 roughly says. Also does seem that racing in itself causes bad outcomes.
However, I feel like the “create a parliament of AIs that are all different and hope one of them is sorta aligned” is a different strategy based on very different assumptions though? (It is a strategy I do not agree with, but think the AI 2027 logic would be unlikely to convince a person who agreed with it, and this would be reasonable given such a person’s priors).