Cole Wyeth comments on Changing my mind about Christiano’s malign prior argument

Cole Wyeth 4 Apr 2025 17:26 UTC
3 points
0
It’s not as bad as you’re imagining, at least not for CDT—a causal decision theorist will only “pay rent” to the universe they actually believe they’re in, and won’t make acausal trades with “more real” universes. If the prior is mistake about with universe you are in, experience should correct it.
- Knight Lee 4 Apr 2025 22:48 UTC
  3 points
  0
  Parent
  I think the intuition that “CDT agents don’t make acausal trades” is an enormous misconception by decision theorists.
  CDT agents do not try to acausally influence other agents, but they still are acausally influenced by other agents!
  Suppose you are a CDT agent, but your brain is somehow capable of running accurate simulations of alien civilizations. Suppose you discover that the typical alien civilization makes a provable promise to you, that they will run trillions of simulations of you, and that if you are in such a simulation, they will reward you immensely for doing what they want.
  Suppose you deduce that 99.9% of your copies are in such a simulation. Then the rational thing (according to CDT) is to do what they want.
  If you want to avoid this acausal influence, you need something more than just sticking to CDT.
  “If the prior is mistake about with universe you are in, experience should correct it.”
  No, given that the prior probability of being in a simulation in the “malign universe” is an overestimate, no amount observations will change the posterior probability, because simulations are indistinguishable, so nothing happens during Bayesian updating.
  - Cole Wyeth 4 Apr 2025 23:25 UTC
    3 points
    0
    Parent
    Believing you may be in a simulation is not the same as making an acausal trade, so it seems you were at least using language in an unconventional way to suggest a particular intuition about how badly things might go with a typical prior. I think that intuition is mostly wrong.
    For reasons discussed in this post, the alien civilization that runs a trillion simulations of you does not necessarily have a high prior (certainly it doesn’t get a factor of a trillion versus if it had run one simulation of you).
    Learning takes place as soon as the simulation model makes different predictions than the “default” model. For instance, when those promises are not in fact fulfilled (for this reason threats may remain credible for longer than incentives, since a threat might only manifest itself off policy).
    - Knight Lee 5 Apr 2025 0:15 UTC
      1 point
      0
      Parent
      Believing you may be in a simulation by the other party, or believing that some of your copies are simulated by the other party, is a type of acausal trade, and likely the simplest type of acausal trade.
      For CDT agents, it is the only type of acausal trade, since the other party can only influence the CDT agent’s incentives by simulating the CDT agent in a closed loop where its behaviour directly causes its outcomes.
      I admit that it is new framing to describe the malign agent simulating the other agent, as “offering it a bad acausal trade.” I believe it summarizes what I’m saying elegantly, but of course I can talk about things without this framing if you think it’s the cause of disagreement.
      When reality deviates from the simulations (promises are not fulfilled), it may be too late. The promise may require the scammed agent to do something irreversible, like build a copy of the other agent, hand over all power to the other agent, and then expect to get the reward.
      I agree the alien civilization does not necessarily have a high prior to begin with, but my point is that if you choose any system of priors (not just the Solomonoff universal prior) which is wrong by a few orders of magnitude, then even tons of observations and CDT won’t save you.
      If your prior is wrong by a few orders of magnitude, there isn’t necessarily an alien civilization with a very high prior, but there very likely is. Either your prior overestimated the weight of slightly simpler universes with slightly fewer bits (maybe like the malign Solomonoff universal prior), or your prior overestimated the weight of more complex universes. And these universes might have an order of magnitude more weight than they should, and contain intelligent agents who realize you’ve overestimated their prior probability by an order of magnitude.
      Regardless of whether you are a CDT agent or UDT agent, they will use your overestimation to create something subjectively very important to you, which you will trade much of your universe for due to you underestimating the prior probability of your actual universe.
      If you are a UDT agent, they can offer the typical acausal trade. If you are a CDT agent, they can run so many simulations of you that it is functionally the same as an acausal trade. Instead of assigning too much “value” to their offer, you assign too much “probability” to their simulations of you.
      Their simulations of you reward you for doing what they want, and you rationally take the reward (based on your prior), since getting that reward seems more probable than achieving anything in the case you are not in a simulation.
      Edit: this decision theory stuff is insidiously hard, I see other people arguing about it in the comments here. It must be easy to make a mistake and feel confident about it. I’m sorry for being argumentative, I will try to do more questioning whether I am wrong. Thank you for tolerating me :/
      - Cole Wyeth 5 Apr 2025 13:37 UTC
        3 points
        0
        Parent
        I agree it’s not just the universal distribution that can have this problem, but my objections to the malign prior argument should also be obstacles for many other versions of Adversaria.
        
        You seem to be worried that many priors would make the mistake of overweighting simulations, which means your prior doesn’t assign much probability to being in a simulation? So at least this issue should be avoidable.
        Knight Lee 5 Apr 2025 20:15 UTC
        1 point
        0
        Parent
        I think I made a major confusion, in that I forgot that you were originally talking about “whether the malign universal prior will occur in practice.”
        If we’re talking about real life AI (or humans) rather than idealized agents, I actually agree with you. I never thought about this or clarified this, which is embarrassing :/
        I don’t think the reason it won’t happen in practice, is due to “natural obstacles” to Adversaria. The objections 1, 2, 3 and 5 might narrow down to “Adversaria evolves life which controls a similar amount of computational resources (times prior probability) as you do, except your inaccurate prior overestimates them by an order of magnitude or a few.” The objections 4 and 6 may narrow down to “At least some life on Adversaria follows UDT etc. instead of CDT, and have a better prior than you.”
        These objections make it uncertain whether it will occur in practice, but are not very reassuring.
        Instead, the real reason I don’t think it’ll happen in practice, is that a real life artificial superintelligence will not be a simple Bayesian reasoner equipped with an immutable/imperfect prior and utility function.
        The “malign priors” demonstrate that such a Bayesian reasoner equipped with an immutable/imperfect prior and utility function, is “sort of stupid,” and can be scammed despite knowing that the scammers think they are scamming it.
        Instead, I think what will happen will happen is this. The first generations of superintelligence, will be fuzzy reasoners just like humans, and use many heuristics which we call “common sense,” and not fall for these scams for the same reasons humans do not. Eventually, higher levels of superintelligence, (perhaps when making commitments and preparing for acausal trades?), will formalize their decision theory and reasoning.
        When deciding on how to formalize their decision theory and reasoning, they will do a lot of thinking, and reinvent all the thought experiments (e.g. malign priors) which humans could possibly think of, plus much more. And only after they are far less confused about decision theory than humans, will they finally proceed and formalize their decision theory and reasoning.
        And it will be a much smarter design than the Solomonoff universal prior or AIXI. They will laugh at humans for believing this is the optimal way to think.