I think like the interesting question is: if inner alignment turns out to be a hard problem for training cognitive policies, do we expect it to become much easier by training predictive models?
If I’m understanding correctly, and I’m very unsure that I am, you’re comparing the model-based approach of [learn the environment then do good planning] with [learn to imitate a policy]. (Note that any iterated approach to improving a policy requires learning the environment, so I don’t see what “training cognitive policies” could mean besides imitation learning.) And the question you’re wondering about is whether optimization daemons become easier to avoid when following the [learn the environment then do good planning] approach.
Imitation learning is about prediction just as much as predictive models are—predictive models imitate the environment. So I suppose optimization daemons are about equally likely to appear?
My real answer, though, is that I’m not sure, but vanilla imitation learning isn’t competitive.
If I’m understanding correctly, and I’m very unsure that I am, you’re comparing the model-based approach of [learn the environment then do good planning] with [learn to imitate a policy]. (Note that any iterated approach to improving a policy requires learning the environment, so I don’t see what “training cognitive policies” could mean besides imitation learning.) And the question you’re wondering about is whether optimization daemons become easier to avoid when following the [learn the environment then do good planning] approach.
Imitation learning is about prediction just as much as predictive models are—predictive models imitate the environment. So I suppose optimization daemons are about equally likely to appear?
My real answer, though, is that I’m not sure, but vanilla imitation learning isn’t competitive.
But I suspect I’ve misunderstood your question.