# Two New Newcomb Variants

Two Newcomb variants to add to the list of examples where optimal choice and optimal policy are diammetrically opposed. I don’t think problems these exist anywhere else yet.

### 4 Boxes Problem

In a game show there are 4 transparent boxes in a row, each of which starts off with \$1 inside. Before the human player enters, the show’s 4 superintelligent hosts have a competition: they each have a box assigned to them, and they win if the human chooses their box. To motivate the human they cannot alter their box at all, but they are able to put \$100 into any or all of the other 3 boxes.

Our first human is an adherent of CDT. Since the boxes are transparent, and he wants to get money, he obviously chooses the one with the most. It’s not like his decisions can change what the hosts have already done. Putting money into the other boxes would only have hurt the hosts’ chances, so they didn’t do that. Instead, all the boxes have only the original \$1. CDT is disappointed, but picks one at random and goes home.

Our second human is an adherent of EDT. She knows if she sees how much is in the boxes, she’ll pick the one with the most, which will end up being only \$1. Because of this, she blindfolds herself before walking on stage. She weighs the two left boxes together, to see how much is in them total, and the two on the right as well. “If the left two are heavier I’ll pick one of them at random,” she says, “if the two right I’ll pick one of those, and if the same I’ll pick any at random”. EDT was quite happy when she found out she was going to do this. She’d worried that she wouldn’t get the blindfold on in time and only get \$1 but it seemed to work out. The hosts predicted this, of course, so the two on the left put \$100 in eachothers boxes, to increase their own odds, and the two on the right did the same, and EDT picks at random and leaves with \$101. [I’m not completely sure EDT can’t do better than this, so corrections with even more elaborate schemes encouraged]

Our third human is an adherent of FDT. She considers the policies she could implement, and decides that she will take one of the boxes with the least money. Not wanting to give their opponents an advantage, the hosts all put \$100 in every other box, and FDT leaves with \$301.

### 5 Boxes problem

This time the game show wants to see who is best at prediction. Each of the 4 superintelligent hosts must try to predict which box the human will pick, and is rewarded based on who has the most successful predictions. To make it fun, each host has \$100 and 1 grenade (which kills you if you choose it) that they can put in any of the boxes, without the other hosts knowing which they’ve picked. Since they know humans dislike dying, there are 5 transparent boxes this time to make sure there’s a way for the human to avoid all grenades. The player can see how much money or grenades are in each, and they’re also told which box each host is betting on, and which incentive and grenade was theirs.

Our first human is CDT again. The hosts know he’d never pick a box with a grenade, and otherwise would pick the one with the most money. They each randomly picked a box to incentivise, selected it as their prediction, and boobytrapped a different box to maximise their chances of winning. CDT naturally picks the most valuable box that doesn’t have a grenade, and leaves with expected returns of ~\$152. He thinks he did pretty well, all things considered.

EDT is really struggling to come up with a clever blindfold strategy this time, and doesn’t want to blow herself up by mistake. She mournfully abandons schemes to manage the news, looks at the boxes, and picks the best one. Expected return: ~\$152.

FDT does have a plan, though it took some thinking. She decides she won’t pick a box that was predicted by a host, unless they have put their grenade in box 5 and their incentive in one of the first two boxes, and otherwise she’ll pick the one with the most helpful host predictions, and after that the most value to her. She commits to do this even if she has to pick a grenade and kill herself to do so. Since each host guessed she’d set this policy, and had no hope of winning if they didn’t play along, they have already put their grenades in box 5, predicted and incentivised one of the first two, and hoped to get lucky. This gets an expected return of ~\$275, which I’m pretty sure is the maximum possible assuming hosts who are zero-sum competing and noncooperative unless positively incentivised.

[It’s a bit of a stretch of the rules, but EDT does have an ideal strategy as well, after this point: Blindfold herself, walk on stage, yell to FDT who is watching from the audience, ask which one FDT would endorse picking, and then pick that box without looking. If your decision theory ever tells you to do this, you probably have a bad decision theory. At least she’s not CDT, who doesn’t see how blindfolds and phoning a friend can cause boxes to have more money in them.]

• 14 Nov 2022 20:28 UTC
8 points
0 ∶ 0

I don’t think these add very much, and I further think, if you’re proposing more than one predictor, you have to acknowledge that they’re going to model each other, and give SOME reason for them not to be perfectly symmetrical. If they’re symmetrical and know it, then (presuming they very slightly prefer spending less over spending more, if they can’t actually influence the outcome) all boxes have \$1, for any player action.

Assuming stupid or semi-predictive omegas, precommitting to pick the second-fullest box seems to be the right strategy for 4-boxes (incentive for predictors to put money in everyone else’s boxes). Only a slight modification of the naive strategy for 5-boxes: break ties by picking the highest-numbered box (provide Schelling point for the predictors to all bet on, and fund, the same box).

• Why would precommitting to pick the second-fullest box give an incentive for predictors to put money in everyone else’s boxes?

• Only for stupid/​partial predictors. If they predict each other and know they’re symmetrical, nothing the agent does matters. If they’re trying to optimize against other predictors who they don’t think are as smart as they, they can hope that some will make mistakes, and they will do their best not to be biggest by putting money in all their opponents’ boxes. This is the same incentive as “pick the smallest amount” would be, but in the case that any mistakes DO get made, and there are varying amounts, the agent prefers more money to less.

• Couldn’t you equally argue that they will do their best not to be smallest by not putting any money in all their opponent’s boxes? After all, “second-fullest” is the same as “third-emptiest”.

• I’m a fan, particularly of the first.

• 14 Nov 2022 17:13 UTC
5 points
0 ∶ 0

You seem to be assuming the human moves first in logical time, before the superintelligent hosts. You also seem to be assuming that the superintelligent hosts are using CDT (if they use FDT, then by symmetry considerations all of their possible actions have equal payoff, so what they do is arbitrary). Any particular reason for these assumptions?

Where do the numbers \$152 and \$275 come from? I would have thought they should be \$100 and \$200, respectively.

In the 5 box problem, why doesn’t FDT force all of the incentives into box 1, thus getting \$400?

• The hosts aren’t competing with the human, only each other, so even if the hosts move first logically they have no reason or opportunity to try to dissuade the player from whatever they’d do otherwise. FDT is underdefined in zero-sum symmetrical strategy games against psychological twins, since it can foresee a draw no matter what, but choosing optimal strategy to get to the draw still seems better than playing dumb strategies on purpose and then still drawing anyway.

Why do you think they should be \$100 and \$200? Maybe you could try simulating it?

What happens if FDT tries to force all the incentives into one box? If the hosts know exactly what every other host will predict, what happens to their zero-sum competition and their incentive to coordinate with FDT?

• If the hosts move first logically, then TDT will lead to the same outcomes as CDT, since it’s in each host’s interest to precommit to incentivising the human to pick their own box—once the host has precommitted to doing this, the incentive works regardless of what decision theory the human uses. In math terms, if x is the choice of which box to incentivize (with “incentivize your own box” being interpreted as “don’t place any money in any of the other boxes”), the human gets to choose a box f(x) on the basis of x, and the host gets to choose x=g(f) on the basis of the function f, which is known to the host since it is assumed to be superintelligent enough to simulate the human’s choices in hypothetical simulations. By definition, the host moving first in logical time would mean that g is chosen before f, and f is chosen on the basis of what’s in the human’s best interest given that the host will incentivize box g(f). But then the optimal strategy is for g to be a constant function.

Regarding \$100 and \$200, I think I missed the part where you said the human picks the box with the maximum amount of money—I was assuming he picked a random box.

Regarding the question of how to force all the incentives into one box, what about the following strategy: choose box 1 with probability 1 - (400 - x) epsilon, where x is the payoff of box 1. Then it is obviously in each host’s interest to predict box 1, since it has the largest probability of any box, but then it is also in each host’s interest to minimize 400 - x i.e. maximize x. This is true even though the hosts’ competition is zero-sum.

• Regarding the question of how to force all the incentives into one box, what about the following strategy: choose box 1 with probability 1 - (400 - x) epsilon, where x is the payoff of box 1. Then it is obviously in each host’s interest to predict box 1, since it has the largest probability of any box, but then it is also in each host’s interest to minimize 400 - x i.e. maximize x. This is true even though the hosts’ competition is zero-sum.

If the hosts are all predicting box 1, why does it matter with what probability the human picks box 1? (If the hosts’ payoffs for all-predict-correctly and all-predict-incorrectly are different, then their game isn’t zero-sum.)

• If the hosts move first logically, then TDT will lead to the same outcomes as CDT, since it’s in each host’s interest to precommit to incentivising the human to pick their own box

It’s in the hosts interests to do that if they think the player is CDT, but it’s not in their interests to commit to doing that. They don’t lose anything by retaining the ability to select a better strategy later after reading the players mind.

• Yes they do. For simplicity suppose there are only two hosts, and suppose host A precommits to not putting money host B’s box, while host B makes no precommitments about how much money he will put in host A’s box. Then the human’s optimal strategy is “pick host A’s box with probability 1 - x epsilon, where x is the amount of money in host A’s box”. This incentivizes host B to maximize the amount in host A’s box (resulting in payoff ~101 for the human), but it would have been better for him if he had precommitted to do the same as A, since then by symmetry his box would have been picked half the time instead of 101 epsilon of the time.

• Regarding the second problem, there’s a nash equilibrium where two agents bomb box 1 while betting on box 2 and two other agents bomb box 2 while betting on box 1. No agent can unilaterally change its strategy to score more.

• They can though? Bomb box 5, incentivise box 1 or 2, bet on box 3 or 4. Since FDT’s strategy puts rewarding cooperative hosts above money or grenades, she picks the box that rewards the defector and thus incentivises all 4 to defect from that equilibrium. (I thought over her strategy for quite a bit and there are probably still problems with it but this isn’t one of them)

• Both of these variants seem strictly worse than standard Newcombe. They introduce more complications and impute unjustified agent beliefs about what the superintelligent hosts will do.

• They are more difficult, yes, but they explore the space of strategies. Or do you have simpler versions in mind that also involve seemingly contradictory choices?

• Update: Both Eva and I seem to agree that it seems somewhat unfair to compare CDT and EDT where we give the EDT agent the option of blindfolding themselves without giving CDT the option to engage in similar manipulations (Eva suggests an explosive collar).

I incorrectly asserted that we were allowing EDT to additionally pre-commit to a particular strategy, but EDT doesn’t need to pre-commit here, or in Newcomb’s problem, because it treats current decisions as evidence about past predictions vs. CDT which holds these predictions constant.

Original comment: Hmm… I don’t think you can compare CDT and EDT by creating a scenario where EDT can pre-commit (and pre-blind itself to information) and forbidding CDT from doing the same. Seems that if you do that then CDT and EDT would pre-commit to reasoning like the FDT agent

• EDT isn’t precommiting to anything here, she does her opinion of the best choice at every step. That’s still a valid complaint that it’s unfair to give her a blindfold though. If CDT found out about the rules of the game before the hosts made their predictions, he’d make himself an explosive collar that he can’t take off and that automatically kills him unless he chooses the box with the least money, and get the same outcome as FDT, and EDT would do that as well. For the blindfold strategy EDT only needs to learn the rules before she sees whats in the boxes, and the technology required is much easier. I mostly wrote it this way because I think it’s a much more interesting way to get stronger behaviour than the commitment-collar trick.

• Wasn’t EDT pre-committing to the strategy of weighing the left two boxes and the right two boxes and then deciding to randomly pick one of the heavier pair? Or are you saying that a blinded EDT automatically adopts this strategy without precommitment?

This comment was edited to fix swapping “lighter” and “heavier”.

• Not whichever is lighter, one of whichever pair is heavier. Yes, I claim an EDT agent upon learning the rules, if they have a way to blind themself but not to force a commitment, will do this plan. They need to maximise the amount of incentive that the hosts have to put money in boxes, but to whatever extent they actually observe the money in boxes they will then expect to get the best outcome by making the best choice given their information. The only middle ground I could find is pairing the boxes, and picking one from the heavier pair. I’d be very happy to learn if I was wrong and there’s a better plan or if this doesn’t work for some reason.

• Oh, silly me. Of course, EDT doesn’t need to pre-commit b/​c EDT just does whatever gives the highest expected value without caring about whether there’s a causal impact or not. So when given the decision of whether to weigh the boxes in pairs vs. the decision of weighing them all and picking the heaviest, it’s happy to weigh in pairs because that increases how much money it expects to win.