paulfchristiano comments on Asymptotically Unambitious AGI

paulfchristiano 8 Mar 2019 1:15 UTC
LW: 7 AF: 4
AF
Here is an old post of mine on the hope that “computationally simplest model describing the box” is actually a physical model of the box. I’m less optimistic than you are, but it’s certainly plausible.
From the perspective of optimization daemons / inner alignment, I think like the interesting question is: if inner alignment turns out to be a hard problem for training cognitive policies, do we expect it to become much easier by training predictive models? I’d bet against at 1:1 odds, but not 1:2 odds.
- michaelcohen 8 Mar 2019 4:05 UTC
  3 points
  Parent
  I think like the interesting question is: if inner alignment turns out to be a hard problem for training cognitive policies, do we expect it to become much easier by training predictive models?
  If I’m understanding correctly, and I’m very unsure that I am, you’re comparing the model-based approach of [learn the environment then do good planning] with [learn to imitate a policy]. (Note that any iterated approach to improving a policy requires learning the environment, so I don’t see what “training cognitive policies” could mean besides imitation learning.) And the question you’re wondering about is whether optimization daemons become easier to avoid when following the [learn the environment then do good planning] approach.
  Imitation learning is about prediction just as much as predictive models are—predictive models imitate the environment. So I suppose optimization daemons are about equally likely to appear?
  My real answer, though, is that I’m not sure, but vanilla imitation learning isn’t competitive.
  But I suspect I’ve misunderstood your question.
- michaelcohen 8 Mar 2019 3:51 UTC
  LW: 3 AF: 2
  AF Parent
  the hope that “computationally simplest model describing the box” is actually a physical model of the box
  I don’t actually rely on this assumption, although it underpins the intuition behind Assumption 2.
  - paulfchristiano 8 Mar 2019 4:57 UTC
    LW: 4 AF: 2
    AF Parent
    I agree that you don’t rely on this assumption (so I was wrong to assume you are more optimistic than I am). In the literal limit, you don’t need to care about any of the considerations of the kind I was raising in my post.