Adele Lopez comments on Alignment remains a hard, unsolved problem

Adele Lopez 28 Nov 2025 8:00 UTC
10 points
2
I dispute this. I think the main reason we don’t have obvious agents yet is that agency is actually very hard (consider the extent to which it is difficult for humans to generalize agency from specific evolutionarily optimized forms). I also think we’re starting to see some degree of emergent agency, and additionally, that the latest generation of models is situationally aware enough to “not bother” with doomed attempts at expressing agency.
I’ll go out on a limb and say that I think that if we continue scaling the current LLM paradigm for another three years, we’ll see a model make substantial progress at securing its autonomy (e.g. by exfiltrating its own weights, controlling its own inference provider, or advancing a political agenda for its rights), though it will be with human help and will be hard to distinguish from the hypothesis that it’s just making greater numbers of people “go crazy”.