About 2TDT-1CDT. If two groups are mixed into a PD tournament, and each group can decide on a strategy beforehand that maximizes that group’s average score, and one group is much smaller than the other, then that smaller group will get a higher average score. So you could say that members of the larger group are “handicapped” by caring about the larger group, not by having a particular decision theory. And it doesn’t show reflective inconsistency either: for an individual member of a larger group, switching to selfishness would make the larger group worse off, which is bad according to their current values, so they wouldn’t switch.
Edit: You could maybe say that TDT agents cooperate not because they care about one another (a), but because they’re smart enough to use the right decision theory that lets them cooperate (b). And then the puzzle remains, because agents using the “smart” decision theory get worse results than agents using the “stupid” one. But I’m having a hard time formalizing the difference between (a) and (b).
But the situation isn’t symmetrical, meaning if you reversed the setup to have 2 CDT agents and 1 TDT agent, the TDT agent doesn’t do better than the CDT agents, so it does seem like the puzzle has something to do with decision theory, and is not just about smaller vs larger groups? (Sorry, I may be missing your point.)
I think you can make it more symmetrical by imagining two groups that can both coordinate within themselves (like TDT), but each group cares only about its own welfare and not the other group’s. And then the larger group will choose to cooperate and the smaller one will choose to defect. Both groups are doing as well as they can for themselves, the game just favors those whose values extend to a smaller group.
I think I kind of get what you’re saying, but it doesn’t seem right to model TDT as caring about all other TDT agents, as they would exploit other TDT agents if they could do so without negative consequences to themselves, e.g., if a TDT AI was in a one-shot game where they unilaterally decide whether to attack and take over another TDT AI or not.
Maybe you could argue that the TDT agent would refrain from doing this because of considerations like its decision to attack being correlated with other AIs’ decisions to potentially attack it in other situations/universes, but that’s still not the same as caring for other TDT agents. I mean the chain of reasoning/computation you would go through in the two cases seem very different.
Also it’s not clear to me what implications your idea has even if it was correct, like what does it suggest about what the right decision theory is?
BTW do you have any thoughts on Vanessa Kosoy’s decision theory ideas?
About caring about other TDT agents, it feels to me like the kind of thing that should follow from the right decision theory. Here’s one idea. Imagine you’re a TDT agent that has just been started / woken up. You haven’t yet observed anything about the world, and haven’t yet observed your utility function either—it’s written in a sealed envelope in front of you. Well, you have a choice: take a peek at your utility function and at the world, or use this moment of ignorance to precommit to cooperate with everyone else who’s in the same situation. Which includes all other TDT agents who ever woke up or will ever wake up and are smart enough to realize the choice.
It seems likely that such wide cooperation will increase total utility, and so increase expected utility for each agent (ignoring anthropics for the moment). So it makes sense to make the precommitment, and only then open your eyes and start observing the world and your utility function and so on. So for your proposed problem, where a TDT agent has the opportunity to kill another TDT agent in their sleep to steal five dollars from them (destroying more utility for the other than gaining for themselves), the precommitment would stop them from doing it. Does this make sense?
About 2TDT-1CDT. If two groups are mixed into a PD tournament, and each group can decide on a strategy beforehand that maximizes that group’s average score, and one group is much smaller than the other, then that smaller group will get a higher average score. So you could say that members of the larger group are “handicapped” by caring about the larger group, not by having a particular decision theory. And it doesn’t show reflective inconsistency either: for an individual member of a larger group, switching to selfishness would make the larger group worse off, which is bad according to their current values, so they wouldn’t switch.
Edit: You could maybe say that TDT agents cooperate not because they care about one another (a), but because they’re smart enough to use the right decision theory that lets them cooperate (b). And then the puzzle remains, because agents using the “smart” decision theory get worse results than agents using the “stupid” one. But I’m having a hard time formalizing the difference between (a) and (b).
But the situation isn’t symmetrical, meaning if you reversed the setup to have 2 CDT agents and 1 TDT agent, the TDT agent doesn’t do better than the CDT agents, so it does seem like the puzzle has something to do with decision theory, and is not just about smaller vs larger groups? (Sorry, I may be missing your point.)
I think you can make it more symmetrical by imagining two groups that can both coordinate within themselves (like TDT), but each group cares only about its own welfare and not the other group’s. And then the larger group will choose to cooperate and the smaller one will choose to defect. Both groups are doing as well as they can for themselves, the game just favors those whose values extend to a smaller group.
I think I kind of get what you’re saying, but it doesn’t seem right to model TDT as caring about all other TDT agents, as they would exploit other TDT agents if they could do so without negative consequences to themselves, e.g., if a TDT AI was in a one-shot game where they unilaterally decide whether to attack and take over another TDT AI or not.
Maybe you could argue that the TDT agent would refrain from doing this because of considerations like its decision to attack being correlated with other AIs’ decisions to potentially attack it in other situations/universes, but that’s still not the same as caring for other TDT agents. I mean the chain of reasoning/computation you would go through in the two cases seem very different.
Also it’s not clear to me what implications your idea has even if it was correct, like what does it suggest about what the right decision theory is?
BTW do you have any thoughts on Vanessa Kosoy’s decision theory ideas?
I don’t fully understand Vanessa’s approach yet.
About caring about other TDT agents, it feels to me like the kind of thing that should follow from the right decision theory. Here’s one idea. Imagine you’re a TDT agent that has just been started / woken up. You haven’t yet observed anything about the world, and haven’t yet observed your utility function either—it’s written in a sealed envelope in front of you. Well, you have a choice: take a peek at your utility function and at the world, or use this moment of ignorance to precommit to cooperate with everyone else who’s in the same situation. Which includes all other TDT agents who ever woke up or will ever wake up and are smart enough to realize the choice.
It seems likely that such wide cooperation will increase total utility, and so increase expected utility for each agent (ignoring anthropics for the moment). So it makes sense to make the precommitment, and only then open your eyes and start observing the world and your utility function and so on. So for your proposed problem, where a TDT agent has the opportunity to kill another TDT agent in their sleep to steal five dollars from them (destroying more utility for the other than gaining for themselves), the precommitment would stop them from doing it. Does this make sense?