kapedalex comments on Alignment Problems All the Way Down

kapedalex 8 Nov 2025 16:19 UTC
1 point
0
It seems to me that aligning a mesa-optimizer could be solved with the same methods used to align a base model. If so, distinction between “can create” and “can control” steps would be minimized. What concerns me more is how we would detect the existence of mesa-optimizers, especially if they emerge randomly.