My argument, though, is that it is still very possible for the difficulty of alignment to be in the Apollo regime, and that we haven’t received much evidence to rule that regime out (I am somewhat skeptical of a P vs. NP level of difficulty, though I think it could be close to that).
Are you skeptical of PvNP-level due to priors or due to evidence? Why those priors / what evidence?
(I think alignment is pretty likely to be much harder than PvNP. Mainly this is because alignment is very very difficult. (Though also note that PvNP has a maybe-possibly-workable approach, https://en.wikipedia.org/wiki/Geometric_complexity_theory, which its creator states might take a mere one century, though I presume that’s not a serious specific estimate.))
I think the skepticism about the protein folder was “we can’t make something effective because we can’t optimize enough / search hard enough”, where my skepticism about alignment is “we can’t make something aligned because we can’t aim optimization processes well enough”. Part of how we can’t aim search processes is that we don’t have easily testable proxy measurements that are bound up with alignment strongly enough. What would be the evaluation function for AlignmentFold?
Are you skeptical of PvNP-level due to priors or due to evidence? Why those priors / what evidence?
(I think alignment is pretty likely to be much harder than PvNP. Mainly this is because alignment is very very difficult. (Though also note that PvNP has a maybe-possibly-workable approach, https://en.wikipedia.org/wiki/Geometric_complexity_theory, which its creator states might take a mere one century, though I presume that’s not a serious specific estimate.))
Could we not devise AlphaFold but for LLM alignment?
Your P/NP remark reminded me of the scepticism around Protein folder before the Alpha fold days.
I think the skepticism about the protein folder was “we can’t make something effective because we can’t optimize enough / search hard enough”, where my skepticism about alignment is “we can’t make something aligned because we can’t aim optimization processes well enough”. Part of how we can’t aim search processes is that we don’t have easily testable proxy measurements that are bound up with alignment strongly enough. What would be the evaluation function for AlignmentFold?