I agree there’s a core principle somewhere around the idea of “controllable implies understandable”. But when I think about this with respect to humans studying biology, then there’s another thought that comes to my mind; the things we want to control are not necessarily the things the system itself is controlling. For example, we would like to control the obesity crisis (and weight loss in general) but it’s not clear that the biological system itself is controlling that. It almost certainly was successfully controlling it in the ancestral environment (and therefore it was understandable within that environment) but perhaps the environment has changed enough that it is now uncontrollable (and potentially not understandable). Cancer manages to successfully control the system in the sense of causing itself to happen, but that doesn’t mean that our goal, “reliably stopping cancer” is understandable, since it is not a way that the system is controlling itself.
This mismatch seems pretty evidently applicable to AI alignment.
And perhaps the “environment” part is critical. A system being controllable in one environment doesn’t imply it being controllable in a different (or broader) environment, and thus guaranteed understandability is also lost. This feels like an expression of misgeneralization.
Yeah, I think I misspoke a bit. I do think that controllability is related to understanding, but also I’m trying to gesture at something more like “controllability implies simpleness.”
Where, I think what I’m tracking with “controllability implies simpleness” is that the ease with which we can control things is a function of how many causal factors there are in creating it, i.e., “conjunctive things are less likely” in some Occam’s Razor sense, but also conjunctive things “cost more.” At the very least, they cost more from the agents point of view. Like, if I have to control every step in some horror graph in order to get the outcome I want, that’s a lot of stuff that has to go perfectly, and that’s hard. If “I,” on the other hand, as a virus, or cancer, or my brain, or whatever, only have to trigger a particular action to set off a cascade which reliably results in what I want, this is easier.
There is a sense, of course, in which the spaghetti code thing is still happening underneath it all, but I think it matters what level of abstraction you’re taking with respect to the system. Like, with enough zoom, even very simple activities start looking complex. If you looked at a ball following a parabolic arc at the particle level, it’d be pretty wild. And yet, one might reasonably assume that the regularity with which the ball follows the arc is meaningful. I am suspicious that much of the “it’s all ad-hoc” intuitions about biology/neuroscience/etc are making a zoom error (also typing errors, but that’s another problem). Even simple things can look complicated if you don’t look at them right, and I suspect that the ease with which we can operate in our world should be some evidence that this simpleness “exists,” much like the movement of the ball is “simple” even though a whole world of particles underlies its behavior.
To your point, I do think that the ability to control something is related to understanding it, but not exactly the same. Like, there indeed might be things in the environment that we don’t have much control over, even if we can clearly see what their effects are. Although, I’d be a little surprised if there were true for, e.g., obesity. Like, it seems likely that hormones (semaglutide) can control weight loss, which makes sense to me. A bunch of global stuff in bodies is regulated by hormones, in a similar way, I think, to how viruses “hook into” the cells machinery, i.e., it reliably triggers the right cascades. And controlling something doesn’t necessarily mean that it’s possible to understand it completely. But I do suspect that the ability to control implies the existence of some level of analysis upon which the activity is pretty simple.
I agree there’s a core principle somewhere around the idea of “controllable implies understandable”. But when I think about this with respect to humans studying biology, then there’s another thought that comes to my mind; the things we want to control are not necessarily the things the system itself is controlling. For example, we would like to control the obesity crisis (and weight loss in general) but it’s not clear that the biological system itself is controlling that. It almost certainly was successfully controlling it in the ancestral environment (and therefore it was understandable within that environment) but perhaps the environment has changed enough that it is now uncontrollable (and potentially not understandable). Cancer manages to successfully control the system in the sense of causing itself to happen, but that doesn’t mean that our goal, “reliably stopping cancer” is understandable, since it is not a way that the system is controlling itself.
This mismatch seems pretty evidently applicable to AI alignment.
And perhaps the “environment” part is critical. A system being controllable in one environment doesn’t imply it being controllable in a different (or broader) environment, and thus guaranteed understandability is also lost. This feels like an expression of misgeneralization.
Yeah, I think I misspoke a bit. I do think that controllability is related to understanding, but also I’m trying to gesture at something more like “controllability implies simpleness.”
Where, I think what I’m tracking with “controllability implies simpleness” is that the ease with which we can control things is a function of how many causal factors there are in creating it, i.e., “conjunctive things are less likely” in some Occam’s Razor sense, but also conjunctive things “cost more.” At the very least, they cost more from the agents point of view. Like, if I have to control every step in some horror graph in order to get the outcome I want, that’s a lot of stuff that has to go perfectly, and that’s hard. If “I,” on the other hand, as a virus, or cancer, or my brain, or whatever, only have to trigger a particular action to set off a cascade which reliably results in what I want, this is easier.
There is a sense, of course, in which the spaghetti code thing is still happening underneath it all, but I think it matters what level of abstraction you’re taking with respect to the system. Like, with enough zoom, even very simple activities start looking complex. If you looked at a ball following a parabolic arc at the particle level, it’d be pretty wild. And yet, one might reasonably assume that the regularity with which the ball follows the arc is meaningful. I am suspicious that much of the “it’s all ad-hoc” intuitions about biology/neuroscience/etc are making a zoom error (also typing errors, but that’s another problem). Even simple things can look complicated if you don’t look at them right, and I suspect that the ease with which we can operate in our world should be some evidence that this simpleness “exists,” much like the movement of the ball is “simple” even though a whole world of particles underlies its behavior.
To your point, I do think that the ability to control something is related to understanding it, but not exactly the same. Like, there indeed might be things in the environment that we don’t have much control over, even if we can clearly see what their effects are. Although, I’d be a little surprised if there were true for, e.g., obesity. Like, it seems likely that hormones (semaglutide) can control weight loss, which makes sense to me. A bunch of global stuff in bodies is regulated by hormones, in a similar way, I think, to how viruses “hook into” the cells machinery, i.e., it reliably triggers the right cascades. And controlling something doesn’t necessarily mean that it’s possible to understand it completely. But I do suspect that the ability to control implies the existence of some level of analysis upon which the activity is pretty simple.