At the outermost feedback loop, capabilities can ultimately be grounded via relatively easy objective measures such as revenue from AI, or later, global chip and electricity production, but alignment can only be evaluated via potentially faulty human judgement. Also, as mentioned in the post, the capabilities trajectory is much harder to permanently derail because unlike alignment, one can always recover from failure and try again. I think this means there’s an irreducible logical risk (i.e., the possibility that this statement is true as a matter of fact about logic/math) that capabilities research is just inherently easier to automate than alignment research, that no amount of “work hard to automated alignment research” can lower beyond. Given the lack of established consensus ways of estimating and dealing with such risk, it’s inevitable that the people with the least estimate/concern about this risk (and other AI risks) will push capabilities forward as fast as they can, and seemingly the only way to solve this on the societal level is to push for norms/laws against doing that, i.e., slow down capabilities research via (politically legitimate) force and/or social pressure. I suspect the author might already agree with all this (the existence of this logical risk, the social dynamics, the conclusion about norms/laws being needed to reduce AI risk beyond some threshold), but I think it should be emphasized more in a post like this.
I suspect the author might already agree with all this (the existence of this logical risk, the social dynamics, the conclusion about norms/laws being needed to reduce AI risk beyond some threshold)...
Yes I think I basically agree. That is, I think it’s very possible that capabilities research is inherently easier to automate than alignment research; I am very worried about the least cautious actors pushing forward prematurely; as I tried to emphasize in the post, I think capability restraint is extremely important (and: important even if we can successfully automate alignment research); and I think that norms/laws are likely to play an important role there.
At the outermost feedback loop, capabilities can ultimately be grounded via relatively easy objective measures such as revenue from AI, or later, global chip and electricity production, but alignment can only be evaluated via potentially faulty human judgement. Also, as mentioned in the post, the capabilities trajectory is much harder to permanently derail because unlike alignment, one can always recover from failure and try again. I think this means there’s an irreducible logical risk (i.e., the possibility that this statement is true as a matter of fact about logic/math) that capabilities research is just inherently easier to automate than alignment research, that no amount of “work hard to automated alignment research” can lower beyond. Given the lack of established consensus ways of estimating and dealing with such risk, it’s inevitable that the people with the least estimate/concern about this risk (and other AI risks) will push capabilities forward as fast as they can, and seemingly the only way to solve this on the societal level is to push for norms/laws against doing that, i.e., slow down capabilities research via (politically legitimate) force and/or social pressure. I suspect the author might already agree with all this (the existence of this logical risk, the social dynamics, the conclusion about norms/laws being needed to reduce AI risk beyond some threshold), but I think it should be emphasized more in a post like this.
Yes I think I basically agree. That is, I think it’s very possible that capabilities research is inherently easier to automate than alignment research; I am very worried about the least cautious actors pushing forward prematurely; as I tried to emphasize in the post, I think capability restraint is extremely important (and: important even if we can successfully automate alignment research); and I think that norms/laws are likely to play an important role there.