Christopher King comments on Inference from a Mathematical Description of an Existing Alignment Research: a proposal for an outer alignment research program

Christopher King 7 Jun 2023 13:52 UTC
1 point
0
Actually, I think the universal prior being malign actually does break this. (I thought it might be only a little malign, which would be okay, but after a little reading it appears that it might be really malign!)

A crude example of how this might impact IMDEAR is that, while using solomonoff inductive inference to model the human, it sneakily inserts evil nanobots into the model of the bloodstream. (This specific issue can probably be patched, but there are more subtle ways it can mess up IMDEAR.)

Even creating a model of the simulation environment is messed up, since I planned on using inference for the difficult part.

The only thing I guess we can hope for is if we find a different prior that isn’t malign, and for now we just leave the prior as a free variable. (See some of the ping backs on the universal prior for some approaches in this direction.) But I’m not sure how likely we are to find such a prior. 🤔

Also, Paul Christiano has a proposal with similar requirements to IMDEAR, but at a lower tech level: Specifying a human precisely (reprise).

The alternative is to adjust IMDEAR to not use solomonoff induction at all, and define/model everything directly, but this is probably much harder.
What links here?
- Inference from a Mathematical Description of an Existing Alignment Research: a proposal for an outer alignment research program by Christopher King (2 Jun 2023 21:54 UTC; 7 points)
- Tamsin Leake 5 Oct 2023 6:44 UTC
  2 points
  0
  Parent
  it is not the case that, simply, “the universal prior is malign”. various universal priors (solomonoff induction, levin search, what QACI does, many other options…), being used in various ways, are to-various-extents malign. it depends a lot what you’re doing.
  
  i’m quite hopeful that we can get sufficiently-not-malign uses of some universal prior for QACI, and thus probably also for IMDEAR (conditional on the rest of IMDEAR being workable).