Do Timeless Decision Theorists reject all blackmail from other Timeless Decision Theorists?
I have a technical question about Timeless Decision Theory, to which I didn’t manage to find a satisfactory answer in the published MIRI papers.
(I will just treat TDT, UDT, FDT and LDT as the same thing, because I do not understand the differences. As far as I understand they are just different degrees of formalization of the same thing.)
On page 3 of the FDT paper ( https://arxiv.org/pdf/1710.05060.pdf ) it is claimed that TDT agents “resist extortion in blackmail dilemmas”.
I understand why a TDT agent would resist extortion, when a CDT agent blackmails it. If a TDT agent implements an algorithm that resolutely rejects all blackmail, then no CDT agent will blackmail it (provided the CDT agent is smart enough to be able to accurately predict the TDT’s action), so it is rational for the TDT to implement such a resolute blackmail rejection algorithm.
But I do not believe that a TDT agent rejects all blackmail, when another TDT agent sends the blackmail. The TDT blackmailer could implement a resolute blackmailing algorithm that sends the blackmail independently of whether the extortion is successful or not, and then the TDT who receives the blackmail has no longer such a clear cut incentive to implement a resolute blackmail rejection algorithm, making the whole situation much more complicated.
In fact it appears to me that the very logic that would make a TDT resolutely reject all blackmail is also precisely the logic that would also make a TDT resolutely send all blackmail.
I haven’t yet managed to figure out what two TDTs would actually do in a blackmail scenario, but I will now give an argument why resolutely rejecting all blackmail is definitely not the correct course of action.
My Claim: There is a blackmail scenario involving two TDT’s, where the TDT that gets blackmailed does not implement a resolute blackmail rejection algorithm.
We consider the following game:
We have two TDT agents A and B.
A possesses a delicious cookie worth 1 utility.
B has drafted a letter saying “Hand over your cookie to me, or I will destroy the entire universe”.
The game proceeds as follows:
First, B can choose whether to send his blackmail letter to A or not.
Secondly, A can choose whether to hand over his cookie to B or not.
(I do give A the option to hand over the cookie even when the blackmail has not been sent, but that option is probably just stupid and will not be taken)
We give out the following utilities in the following situations.
If B doesn’t send and A doesn’t hand over the cookie, then A gets 1 utility and B gets 0 utility.
If B doesn’t send and A hands over the cookie, then A gets 0 utility and B gets 1 utility.
If B sends and A doesn’t hand over the cookie, then A gets - ∞ utility and B gets - ∞ utility.
If B sends and A hands over the cookie, then A gets 0 utility and B gets 1 utility.
This tree shows the utilities of all the outcomes.
Since we are dealing with TDT’s, before the game even starts both agents will think about what decision algorithms to implement, and implement the algorithm that leads to the best outcomes for them.
I will only consider two possible algorithms for A:
The first algorithm is a causal decision theory algorithm, that hands over the cookie when B sends, and keeps the cookie otherwise.
The second algorithm is the resolute blackmail rejection algorithm that always keeps the cookie no matter what.
If we can show that the first algorithm outperforms the second algorithm, then we have shown that TDT does not recommend implementing the second algorithm, which proves my claim.
When A tries to decide which of these two algorithms is better, it faces a decision problem given by this tree
So what does TDT recommend here?
To figure this out, we make the following observation:
The two decision trees I have drawn are equivalent! They describe the exact same game, just with the roles of A and B reversed.
If A implements a resolute blackmail rejection algorithm, then A is essentially sending a weird form of blackmail to B.
This equivalence allows us to figure out which of the two algorithms TDT prefers.
In the logically counterfactual scenario in which TDT recommends that A should “resolutely reject” in the second decision Tree, TDT also recommends that B should “send” in the first decision Tree. In this hypothetical scenario, B sends, A rejects, leading to - ∞ utility for both agents.
That outcome is much worse than anything you can get when implementing CDT.
Therefore TDT does not recommend to resolutely reject all blackmail in this particular scenario. qed
This argument does not prove that TDT’s actually implement CDT. There might be some even better decision algorithms. It just proves that resolute blackmail rejection is not the correct choice.
I would be grateful to hear opinions about whether or not my argument is correct.