The TDT agents are maximizing their number of descendants, you have no right to criticize them for failing to maximize their share of the population. The whole point of the prisoner’s dilemma is that it’s not a zero sum game, but if you count the population of Defectbots against the TDT agents, you are treating it as a zero sum game.
So can we show that TDT would play perfectly in such a scenario? I think yes.
Your decisions interact in complex ways with your peer’s decisions, your descendant’s decisions, etc. There might be a nice simple formula that classifies this, but:
These are TDT agents. So they all behave the same. At least, the ones during the same timestep behave the same. The ones during different time steps are in different situations, but have the same values. Since they (overall) have he same values as past generations, they will together implement a coherent strategy.
And yet if they all switched to being cliquebots they would not only drive out the defectbots, but would have far more children in the long run.
And if they all switched to being paperclip maximisers they would make more paperclips. Neither of these are going to help them maximise their actual utility function.
The TDT agents are maximizing their number of descendants, you have no right to criticize them for failing to maximize their share of the population. The whole point of the prisoner’s dilemma is that it’s not a zero sum game, but if you count the population of Defectbots against the TDT agents, you are treating it as a zero sum game.
And yet if they all switched to being cliquebots they would not only drive out the defectbots, but would have far more children in the long run.
They would not have far more children in the long run. Their descendants would have more children, but their utility function doesn’t care about that.
Edit: and if they do start caring about grandchildren the problem stops being a straightforward Prisoners’ Dilemma.
So can we show that TDT would play perfectly in such a scenario? I think yes.
Your decisions interact in complex ways with your peer’s decisions, your descendant’s decisions, etc. There might be a nice simple formula that classifies this, but:
These are TDT agents. So they all behave the same. At least, the ones during the same timestep behave the same. The ones during different time steps are in different situations, but have the same values. Since they (overall) have he same values as past generations, they will together implement a coherent strategy.
And if they all switched to being paperclip maximisers they would make more paperclips. Neither of these are going to help them maximise their actual utility function.
Fair point. I failed to spot initially that the trick lay in equivocating between “maximise direct descendants” and “maximise all descendants”.