Here is an old post of mine on the hope that “computationally simplest model describing the box” is actually a physical model of the box. I’m less optimistic than you are, but it’s certainly plausible.
From the perspective of optimization daemons / inner alignment, I think like the interesting question is: if inner alignment turns out to be a hard problem for training cognitive policies, do we expect it to become much easier by training predictive models? I’d bet against at 1:1 odds, but not 1:2 odds.
I think like the interesting question is: if inner alignment turns out to be a hard problem for training cognitive policies, do we expect it to become much easier by training predictive models?
If I’m understanding correctly, and I’m very unsure that I am, you’re comparing the model-based approach of [learn the environment then do good planning] with [learn to imitate a policy]. (Note that any iterated approach to improving a policy requires learning the environment, so I don’t see what “training cognitive policies” could mean besides imitation learning.) And the question you’re wondering about is whether optimization daemons become easier to avoid when following the [learn the environment then do good planning] approach.
Imitation learning is about prediction just as much as predictive models are—predictive models imitate the environment. So I suppose optimization daemons are about equally likely to appear?
My real answer, though, is that I’m not sure, but vanilla imitation learning isn’t competitive.
I agree that you don’t rely on this assumption (so I was wrong to assume you are more optimistic than I am). In the literal limit, you don’t need to care about any of the considerations of the kind I was raising in my post.
Here is an old post of mine on the hope that “computationally simplest model describing the box” is actually a physical model of the box. I’m less optimistic than you are, but it’s certainly plausible.
From the perspective of optimization daemons / inner alignment, I think like the interesting question is: if inner alignment turns out to be a hard problem for training cognitive policies, do we expect it to become much easier by training predictive models? I’d bet against at 1:1 odds, but not 1:2 odds.
If I’m understanding correctly, and I’m very unsure that I am, you’re comparing the model-based approach of [learn the environment then do good planning] with [learn to imitate a policy]. (Note that any iterated approach to improving a policy requires learning the environment, so I don’t see what “training cognitive policies” could mean besides imitation learning.) And the question you’re wondering about is whether optimization daemons become easier to avoid when following the [learn the environment then do good planning] approach.
Imitation learning is about prediction just as much as predictive models are—predictive models imitate the environment. So I suppose optimization daemons are about equally likely to appear?
My real answer, though, is that I’m not sure, but vanilla imitation learning isn’t competitive.
But I suspect I’ve misunderstood your question.
I don’t actually rely on this assumption, although it underpins the intuition behind Assumption 2.
I agree that you don’t rely on this assumption (so I was wrong to assume you are more optimistic than I am). In the literal limit, you don’t need to care about any of the considerations of the kind I was raising in my post.