Again, I don’t think Exercise 1 is that simple. Also, if you return the agent to the pool after playing the game (“sample with replacement”) then everyone has infinitely many children, so it is not enough to say that they maximize children.
See, I don’t think I underspecified. Omega doesn’t do something to every agent every time; in round N, Omega picks three at random and plays the game with them. Then in round N+1, it picks three at random (from the pool including the children of round N) and plays the game with them, et cetera.
I agree that if you allow the agents (and not just their children) back into the game, the conditions for folly aren’t met. The point is that you really need a delicately defined setup for TDT to be completely shortsighted, even with a shortsighted utility function.
OK, so now that you’ve pinned it down, my main complaint applies: the distribution of partners that the agent will eventually have for the game is a function of the agent’s strategy. You can’t treat them separately and conclude simply that it cooperate against 1 CDT and 1 TDT. Thus, in doing exercise 1, choosing between strategies, the TDT must do exercises 2-4 and more to determine which strategy has the best expected value. And, since we’re now talking about expected value, the calculation must involve the utility as a function of the number of children. You can set it to be the number of children, but you have to use that somewhere, and not just monotonicity.
That was badly phrased. I meant: the calculation must involve the utility function, the function that converts the number of children into utiles. (original corrected)
So if I understand you right, even with the short-sighted utility function, there’s an echo of Parfit’s Hitchhiker here: what TDT decides on these problems actually controls which situations the agent finds itself in, and thus its possible payoff matrices. Since TDT is supposed to get Parfit’s Hitchhiker right, therefore, it should give the long-term-winning answer even in this case.
Well, there are some more caveats (it’s not clear that agents in the first round would do this, since TDT doesn’t win the Counterfactual Mugging, and if agents in round N don’t think that way, then what about round N+1...), but you’re right that the simple calculation doesn’t actually suffice. Drat, and thanks.
Since TDT is supposed to get Parfit’s Hitchhiker right, therefore, it should give the long-term-winning answer even in this case.
Its goal is still different though (if we restore some missing pieces): it wants to game the probabilities of encountering certain opponents so that a single round that contains TDT delivers the most reward. It just so happens that getting rid of DefectBots serves this purpose, but if the opponents were CooperateBots, it looks like TDTs would drive themselves to extinction (or farm the opponents) to maximize the number of expected cooperating opponents that they can defect against (for each instance where there’s a TDT agent in the round). (I didn’t check this example carefully, so could be wrong, but the principle it exemplifies seems to hold.)
Again, I don’t think Exercise 1 is that simple. Also, if you return the agent to the pool after playing the game (“sample with replacement”) then everyone has infinitely many children, so it is not enough to say that they maximize children.
See, I don’t think I underspecified. Omega doesn’t do something to every agent every time; in round N, Omega picks three at random and plays the game with them. Then in round N+1, it picks three at random (from the pool including the children of round N) and plays the game with them, et cetera.
I agree that if you allow the agents (and not just their children) back into the game, the conditions for folly aren’t met. The point is that you really need a delicately defined setup for TDT to be completely shortsighted, even with a shortsighted utility function.
OK, so now that you’ve pinned it down, my main complaint applies: the distribution of partners that the agent will eventually have for the game is a function of the agent’s strategy. You can’t treat them separately and conclude simply that it cooperate against 1 CDT and 1 TDT. Thus, in doing exercise 1, choosing between strategies, the TDT must do exercises 2-4 and more to determine which strategy has the best expected value. And, since we’re now talking about expected value, the calculation must involve the utility as a function of the number of children. You can set it to be the number of children, but you have to use that somewhere, and not just monotonicity.
Why?
That was badly phrased. I meant: the calculation must involve the utility function, the function that converts the number of children into utiles. (original corrected)
Huh- you may be right. Let me ponder this when I’m less tired.
So if I understand you right, even with the short-sighted utility function, there’s an echo of Parfit’s Hitchhiker here: what TDT decides on these problems actually controls which situations the agent finds itself in, and thus its possible payoff matrices. Since TDT is supposed to get Parfit’s Hitchhiker right, therefore, it should give the long-term-winning answer even in this case.
Well, there are some more caveats (it’s not clear that agents in the first round would do this, since TDT doesn’t win the Counterfactual Mugging, and if agents in round N don’t think that way, then what about round N+1...), but you’re right that the simple calculation doesn’t actually suffice. Drat, and thanks.
Its goal is still different though (if we restore some missing pieces): it wants to game the probabilities of encountering certain opponents so that a single round that contains TDT delivers the most reward. It just so happens that getting rid of DefectBots serves this purpose, but if the opponents were CooperateBots, it looks like TDTs would drive themselves to extinction (or farm the opponents) to maximize the number of expected cooperating opponents that they can defect against (for each instance where there’s a TDT agent in the round). (I didn’t check this example carefully, so could be wrong, but the principle it exemplifies seems to hold.)
That’s a seriously sick idea, but there doesn’t seem to be a way to both set up such a favorable matchup and exploit it- is there?