Jeremy Gillen comments on Changing my mind about Christiano’s malign prior argument

Jeremy Gillen 4 Apr 2025 16:25 UTC
2 points
0
To respond to your edit: I don’t see your reasoning, and that isn’t my intuition. For moderately complex worlds, it’s easy for the description length of the world to be longer than the description length of many kinds of inductor.
Because we have the prediction error bounds.
Not ones that can rule out any of those things. My understanding is that the bounds are asymptotic or average-case in a way that makes them useless for this purpose. So if a mesa-inductor is found first that has a better prior, it’ll stick with the mesa-inductor. And if it has goals, it can wait as long as it wants to make a false prediction that helps achieve its goals. (Or just make false predictions about counterfactuals that are unlikely to be chosen).
If I’m wrong then I’d be extremely interested in seeing your reasoning. I’d maybe pay $400 for a post explaining the reasoning behind why prediction error bounds rule out mesa-optimisers in the prior.
- Lucius Bushnaq 4 Apr 2025 16:44 UTC
  2 points
  0
  Parent
  The bound is the same one you get for normal Solomonoff induction, except restricted to the set of programs the cut-off induction runs over. It’s a bound on the total expected error in terms of CE loss that the predictor will ever make, summed over all datapoints.
  
  Look at the bound for cut-off induction in that post I linked, maybe? Hutter might also have something on it.
  Can also discuss on a call if you like.
  
  Note that this doesn’t work in real life, where the programs are not in fact restricted to outputting bit string predictions and can e.g. try to trick the hardware they’re running on.
  - Jeremy Gillen 4 Apr 2025 17:35 UTC
    2 points
    0
    Parent
    Yeah I know that bound, I’ve seen a very similar one. The problem is that mesa-optimisers also get very good prediction error when averaged over all predictions. So they exist well below the bound. And they can time their deliberately-incorrect predictions carefully, if they want to survive for a long time.