Alex Mallen comments on Alignment remains a hard, unsolved problem

Alex Mallen 27 Nov 2025 18:27 UTC
7 points
2
I think the main reason to expect cognitive oversight to scale better is that, because you’re reading intermediate computations as opposed to behaviors, the AI isn’t as capable of manipulating how they look even after you optimizing against the cognitive oversight. In the limit of fine-grained cogntivie oversight, the computations that led to your reading simply aren’t expressive enough to fool you.