ErickBall comments on Maintaining Alignment during RSI as a Feedback Control Problem

ErickBall 5 Mar 2025 21:22 UTC
8 points
1
The cornerstone of all control theory is the idea of having a set-point and designing a controller to reduce the deviation between the state and the set-point.
But control theory is used for problems where you need a controller to move the system toward the set-point, i.e. when you do not have instant total control of all degrees of freedom. We use tools like PID tuning, lead-lag, pole placement etc. to work around the dynamics of the system through some limited actuator. In the case of AI alignment, not only do we have a very vague concept of what our set-point should be, we also have no reliable way of detecting how close a model is to that set-point once we define it; if we did, we wouldn’t need any of the technology of control theory because we could just change the weights to get it to the set-point (following, say, a simple gradient). This will still be subject to Goodhart’s law unless our measurement is perfect, but feedback control won’t help with that: controllers are only as good as the feedback you send them.