I am a bit skeptical that it is always (or more than 90% of the time) true that there is a root cause outside the body that is useful to identify. For example, maybe someone goes through a stressful life period, and this, combined with some kind of genetic susceptibility, triggers an autoimmune condition. It might be more useful to trace back to that autoimmune condition and then consider that “the cause” without tracing back further, because the external causes might be very arbitrary and uninformative in the context of your current state.
I think your framing makes sense under the assumption that the body is very functional and very self-correcting by default, such that any malady can be traced back to some intervening genetic situation or external circumstance, which then becomes super informative about which interventions would be best. I think this is like not a bad assumption. I am not sure how much I agree with it. Probably I think it’s at least a little underrated as an assumption.
I think the human body is super complex, and it’s good to keep an open mind about how to think about any given health issue and how to frame health in general, so I would worry about someone getting too attached to this frame. But I still think this is an interesting and useful perspective.
Yes, that’s fair; I think it’s non-obvious where it falls. I chose to categorize it under inner misalignment because it constitutes a reason the AI might be misaligned with the objective function.
I figure if you are going to call gradient misalignment a specification error, you could as well call any inner misalignment a failure to specify the right training regimen that gives you what you want. More generally, any failure to generalize can be construed as a failure to specify. So it seems more meaningful to me to define the outer/inner dichotomy as “objective misaligned with intention” v.s. “AI misaligned with objective” as opposed to “failure to specify” v.s. “failure to generalize.”
Hmm, yeah, I think I would call this “perfect-correlate misalignment”: somehow the inductive biases of training are such that it converges toward misaligned goals. This is somewhat unclear from my title and description. I guess I think, in practice, the result of this misaligned bias is likely to be based around some kind of correlate and so this is a convenient way to think about it.