An issue with MacAskill’s Evidentialist’s Wager

In The Evidentialist’s Wager, MacAskill et al. argue as follows:

Suppose you are uncertain as to whether EDT or CDT is the correct decision theory, and face a Newcomb-like decision. If CDT were correct, your decision would only influence the outcomes of your present decision. If EDT were correct, it would provide evidence not only for the outcome of your present decision, but also for the outcome of many similar decisions by similar agents throughout the universe (either because they are exact copies of you, or they have very similar decision theories/​computations to you, etc.). Thus, the stakes for the decision are way higher if EDT is true, and so you should act as if EDT were true (even if you have higher prior credence on CDT).

This argument of course relies on how many of these similar agents actually exist, and how similar they are. They use the term correlated agent to mean some agent similar enough to you so that your decision will acausally provide evidence about theirs. As possible counterexamples to their argument, they point out the existence of different agents:

  1. Anti-correlated agents: Agents whose decision theory will drive them to take the decision opposite to yours.

  2. Evil Twins: Agents positively correlated to you (with your same decision theory) but with drastically different utility functions (or in the extreme, the exact opposite utility function).[1]

Regarding anti-correlated agents, since our decision theories are actual good heuristics for going about the world and rationally obtaining our goals, it seems more likely for agents that exist (that is, have survived) to be positively correlated rather than anti-correlated.

But regarding Evil Twins, because of the Orthogonality Thesis, we might expect on average that there are as much agents positively correlated to us with ~our same utility function (Good Twins) as agents positively correlated to us with ~opposite utility function (Evil Twins).

That is, the universe selects for agents positively correlated to us (instead of anti-correlated), but doesn’t select for agents with ~our utility function (instead of the ~opposite function).

So we should expect the acausal evidence from all these other agents to balance out, and we’re back at EDT having stakes just as high as CDT: only our particular decision is affected.[2]

Possible objection

Of course, this refutation relies on our actually expecting that there are as many Evil Twins as Good Twins (or our being in an epistemic position as to have equal credence for there being more Evil Twins and there being more Good Twins). It’s not true that the Orthogonality Thesis directly implies the first thing: it might perfectly be that any “intelligence level” is compatible with any utility function in mindspace, but still that the universe does select for a certain type of utility function. In fact, it seems likely this happens (at least subtly) in some ways we yet don’t understand. But it seems very unlikely (to me) that the universe should for some reason select for ~our particular utility function.

This intuition is mainly fueled by the consideration of digital minds, and more concretely by how a superintelligent agent maximizing its utility will most likely do horrible things (according to ours)[3].

The actual core of the disagreement

MacAskill et al. sweep all of this under the rug by invoking “the vast number of people who may exist in the future”. On my reading, they’re not saying “and all of these people will have ~our utility function”, which would be naïve (not only because of future human misalignment, which is not a big worry for MacAskill, but also because of aliens and superintelligences). On my reading, they’re saying “and this will tip the balance either way, towards a majority of these agents maximizing either ~our utility function or ~its opposite, which will make the stakes for EDT higher than those of CDT”.

That is: Regardless of whether we know which agents do and don’t exist, it’s very unlikely that there’s exactly the same amount of Good Twins and Evil Twins (except in certain infinite universes). Almost certainly the balance will be tipped in some direction, even if by pure contingent luck.

From this, they trivially conclude that EDT will have higher stakes than CDT: if there are more Good Twins (Evil Twins), EDT will recommend one-boxing (two-boxing) very strongly, since this will provide evidence to you about many agents doing the same. But I’m not satisfied with this answer, because if you don’t know whether more Good Twins or Evil Twins exist, you won’t be obtaining that evidence (upon taking the decision)!

That is, EDT is naturally interpreted from the agent’s perspective (and Bayes net). And so, knowing that there is a fact of the matter as to whether there are more Good Twins or Evil Twins (which an omniscient agent would be able to conclude from a bird’s eye view) doesn’t affect the agent’s Bayes net, and doesn’t increase EDT’s stakes for her.

If we actually had some evidence privileging one of the two options (for instance, because of Orthogonality failing), then EDT would certainly imply higher stakes and point us in that direction. But if we have absolutely no evidence as to which alternative is the case (or even such small evidence that it’s outweighed by our prior credence in CDT), then EDT will provide no higher stakes. I do think that’s the case for now.

  1. ^

    I don’t know whether the use of the term Anti-correlated in the article fits this definition, or also includes Evil Twins, or even includes all agents for which your decision will provide evidence that they decided against your utility function. These details and other considerations feel glossed over in the article, and that’s my main point.

  2. ^

    This argument is very similar to the usual refutation of Pascal’s Wager.

  3. ^

    Here’s one reason why this might not be that big of a problem. For one such superintelligence to face a Newcomb-like decision, another agent must be intelligent enough to accurately predict it (“supersuperintelligent”). Of course, this doesn’t necessarily imply that the first agent won’t still be taking many Newcomb-like decisions with consequences we find abhorrent. But it might be that, instead of a complex structure of leveled agents cooperating or conflicting with each other, there’s just an agent “at the top of the intelligence chain” (superintelligent) which controls most of its lightcone. This agent wouldn’t face any Newcomb-like decisions. As objections, it might still partake in other complex reasoning acausally connected to our EDT reasoning, but that’s not obvious. Or it might deploy many lower intelligence (sub)agents who do face Newcomb-like decisions (if Alignment is solvable).

    More generally, it might be that intelligent enough agents discover a more refined decision theory and thus aren’t acausally connected to our EDT. But this seems unlikely, given the apparent mathematical canonicity and rational usefulness of our theories and Newcomb-like problems.