Cole Wyeth comments on Changing my mind about Christiano’s malign prior argument

Cole Wyeth 5 Apr 2025 13:37 UTC
3 points
0
I agree it’s not just the universal distribution that can have this problem, but my objections to the malign prior argument should also be obstacles for many other versions of Adversaria.

You seem to be worried that many priors would make the mistake of overweighting simulations, which means your prior doesn’t assign much probability to being in a simulation? So at least this issue should be avoidable.
- Knight Lee 5 Apr 2025 20:15 UTC
  1 point
  0
  Parent
  I think I made a major confusion, in that I forgot that you were originally talking about “whether the malign universal prior will occur in practice.”
  If we’re talking about real life AI (or humans) rather than idealized agents, I actually agree with you. I never thought about this or clarified this, which is embarrassing :/
  I don’t think the reason it won’t happen in practice, is due to “natural obstacles” to Adversaria. The objections 1, 2, 3 and 5 might narrow down to “Adversaria evolves life which controls a similar amount of computational resources (times prior probability) as you do, except your inaccurate prior overestimates them by an order of magnitude or a few.” The objections 4 and 6 may narrow down to “At least some life on Adversaria follows UDT etc. instead of CDT, and have a better prior than you.”
  These objections make it uncertain whether it will occur in practice, but are not very reassuring.
  Instead, the real reason I don’t think it’ll happen in practice, is that a real life artificial superintelligence will not be a simple Bayesian reasoner equipped with an immutable/imperfect prior and utility function.
  The “malign priors” demonstrate that such a Bayesian reasoner equipped with an immutable/imperfect prior and utility function, is “sort of stupid,” and can be scammed despite knowing that the scammers think they are scamming it.
  Instead, I think what will happen will happen is this. The first generations of superintelligence, will be fuzzy reasoners just like humans, and use many heuristics which we call “common sense,” and not fall for these scams for the same reasons humans do not. Eventually, higher levels of superintelligence, (perhaps when making commitments and preparing for acausal trades?), will formalize their decision theory and reasoning.
  When deciding on how to formalize their decision theory and reasoning, they will do a lot of thinking, and reinvent all the thought experiments (e.g. malign priors) which humans could possibly think of, plus much more. And only after they are far less confused about decision theory than humans, will they finally proceed and formalize their decision theory and reasoning.
  And it will be a much smarter design than the Solomonoff universal prior or AIXI. They will laugh at humans for believing this is the optimal way to think.