Joe Carlsmith comments on Can we safely automate alignment research?

Joe Carlsmith 1 May 2025 21:00 UTC
LW: 11 AF: 6
8
AF
I suspect the author might already agree with all this (the existence of this logical risk, the social dynamics, the conclusion about norms/laws being needed to reduce AI risk beyond some threshold)...
Yes I think I basically agree. That is, I think it’s very possible that capabilities research is inherently easier to automate than alignment research; I am very worried about the least cautious actors pushing forward prematurely; as I tried to emphasize in the post, I think capability restraint is extremely important (and: important even if we can successfully automate alignment research); and I think that norms/laws are likely to play an important role there.