An alternative approach could be that TDT resolves to never let itself be out-performed by any other decision theory, because of evolutionary considerations as discussed here. Even if that requires a large sacrifice of immediate utility (e.g. two-boxing and taking $1000 along with CDT, rather than one-boxing and taking $1 million, but with CDT getting $1,001,000.) I don’t currently know what to think about that, except that it makes my head spin; it also sounds like a rather Unfriendly form of AI.
I think that merely noting that if the TDT agent had the goal of not being outperformed by an agent with another decision theory, it could achieve it is enough to undermine Problem 1 as a criticism of TDT. If it predicts that undermining a competitor is of sufficient instrumental value to offset the loss of immediate direct rewards of terminal value, then it will undermine the competitor. If it doesn’t make the prediction (correctly), then it is rational to seek the greater reward for itself, even if this helps another agent even more.
I think that merely noting that if the TDT agent had the goal of not being outperformed by an agent with another decision theory, it could achieve it is enough to undermine Problem 1 as a criticism of TDT. If it predicts that undermining a competitor is of sufficient instrumental value to offset the loss of immediate direct rewards of terminal value, then it will undermine the competitor. If it doesn’t make the prediction (correctly), then it is rational to seek the greater reward for itself, even if this helps another agent even more.