Neel Nanda comments on Safety researchers should take a public stance

Neel Nanda 21 Sep 2025 8:20 UTC
2 points
0
we do not know currently how to do this

Agreed this is implied

we should currently not be racing

I believe Leo believes this, and it’s somewhat implied by the statement, though imo that statement is also consistent with eg “there’s a 50% chance that by the time we make AGI we’ll have figured how to align ASI, therefore it’s fine to continue to that point and then we can stop if need be or continue”

I wish we could coordinate to not do that.

I support efforts to stop racing at substantial cost

To me the latter doesn’t seem the important claim, “should we take significant cost” doesn’t feel like a crux, rather “would this work at all” feels like my crux, and “I wish” reads to me like it’s side stepping that. This was the key bit of what Leo said that I consider much softer than what the post asks for.

I’m also not sure the former implies the latter, but that gets messier—eg if you think that we will figure out how to align ASI after 6-12 months of pause at the right time, it doesn’t seem very costly to push for that. While if you think it will take at least 20 years and might totally fail, maybe it does. I consider the statement to be agnostic on this.

To be clear, I know Leo’s actual beliefs are on the doomy end, but I’m trying to focus on what I think the statement actually says, and what I meant when I called it obviously reasonable.

Other positions I consider consistent with the statement:
- It would be ridiculous to pause now, instead we should wait until we have good automated safety researchers under good control schemes, then pause for 6-12 months to solve alignment, then continue
- There’s absolutely no way to coordinate with China, there’s a 90% chance ASI is fine by default, and China getting to ASI first would be catastrophic, so the US should race to beat China and roll the dice
- Pushing for a ban is a terrible idea, because it will lead to a hardware overhang and could easily end too early for us to solve the problem, resulting in us having less time to practice on proto-AGI systems