Thanks for the feedback. I agree that in a control system, any divergence between intent and outcome is an alignment issue, and I agree that this makes overoptimization different in control versus selection. Despite the conceptual confusion, I definitely think the connections are worth noting—not only “wireheading,” but the issues with mesa-optimizers. And I definitely think that causal failures are important particularly in this context.
But I strongly endorse how weak and fuzzy this is—which is a large part of why I wanted to try to de-confuse myself. That’s the goal of this mini-sequence, and I hope that doing so publicly in this way at least highlights where the confusion is, even if I can’t successfully de-confuse myself, much less others. And if there are places where others are materially less confused than me and/or you, I’d love for them to write responses or their own explainers on this.
I think I already want to back off on my assertion that the categories should not be applied to controllers. However, I see the application to controllers as more complex. It’s more clear what it means to (successfully) point a selection-style optimization process at a proxy. In a selection setting, you have the proxy (which the system can access), and the true value (which is not accessible). Wireheading only makes sense when “true” is partially accessible, and the agent severs that connection.
I definitely appreciate your posts on this; it hadn’t occurred to me to ask whether the four types apply equally well to selection and control.
Thanks for the feedback. I agree that in a control system, any divergence between intent and outcome is an alignment issue, and I agree that this makes overoptimization different in control versus selection. Despite the conceptual confusion, I definitely think the connections are worth noting—not only “wireheading,” but the issues with mesa-optimizers. And I definitely think that causal failures are important particularly in this context.
But I strongly endorse how weak and fuzzy this is—which is a large part of why I wanted to try to de-confuse myself. That’s the goal of this mini-sequence, and I hope that doing so publicly in this way at least highlights where the confusion is, even if I can’t successfully de-confuse myself, much less others. And if there are places where others are materially less confused than me and/or you, I’d love for them to write responses or their own explainers on this.
I think I already want to back off on my assertion that the categories should not be applied to controllers. However, I see the application to controllers as more complex. It’s more clear what it means to (successfully) point a selection-style optimization process at a proxy. In a selection setting, you have the proxy (which the system can access), and the true value (which is not accessible). Wireheading only makes sense when “true” is partially accessible, and the agent severs that connection.
I definitely appreciate your posts on this; it hadn’t occurred to me to ask whether the four types apply equally well to selection and control.