TsviBT comments on Alignment remains a hard, unsolved problem

TsviBT 27 Nov 2025 9:29 UTC
LW: 22 AF: 7
10
AF

My argument, though, is that it is still very possible for the difficulty of alignment to be in the Apollo regime, and that we haven’t received much evidence to rule that regime out (I am somewhat skeptical of a P vs. NP level of difficulty, though I think it could be close to that).

Are you skeptical of PvNP-level due to priors or due to evidence? Why those priors / what evidence?

(I think alignment is pretty likely to be much harder than PvNP. Mainly this is because alignment is very very difficult. (Though also note that PvNP has a maybe-possibly-workable approach, https://en.wikipedia.org/wiki/Geometric_complexity_theory, which its creator states might take a mere one century, though I presume that’s not a serious specific estimate.))
- Cervera 28 Nov 2025 16:20 UTC
  2 points
  −2
  Parent
  Could we not devise AlphaFold but for LLM alignment?
  
  Your P/NP remark reminded me of the scepticism around Protein folder before the Alpha fold days.
  - TsviBT 28 Nov 2025 20:07 UTC
    6 points
    10
    Parent
    I think the skepticism about the protein folder was “we can’t make something effective because we can’t optimize enough / search hard enough”, where my skepticism about alignment is “we can’t make something aligned because we can’t aim optimization processes well enough”. Part of how we can’t aim search processes is that we don’t have easily testable proxy measurements that are bound up with alignment strongly enough. What would be the evaluation function for AlignmentFold?