Rohin Shah comments on leogao’s Shortform

Rohin Shah 28 Oct 2025 8:39 UTC
LW: 2 AF: 2
0
AF
it’d be fine if you held alignment constant but dialed up capabilities.
I don’t know what this means so I can’t give you a prediction about it.
I don’t really see why it’s relevant how aligned Claude is if we’re not thinking about that as part of it
I just named three reasons:
1. Current models do not provide much evidence one way or another for existential risk from misalignment (in contrast to frequent claims that “the doomers were right”)
2. Given tremendous uncertainty, our best guess should be that future models are like current models, and so future models will not try to take over, and so existential risk from misalignment is low
3. Some particular threat model predicted that even at current capabilities we should see significant misalignment, but we don’t see this, which is evidence against that particular threat model.
Is it relevant to the object-level question of “how hard is aligning a superintelligence”? No, not really. But people are often talking about many things other than that question.
For example, is it relevant to “how much should I defer to doomers”? Yes absolutely (see e.g. #1).