Lukas Finnveden comments on UDT shows that decision theory is more puzzling than ever

Lukas Finnveden 21 Apr 2026 22:04 UTC
LW: 4 AF: 1
0
AF
If there’s a population of mostly TDT/UDT agents and few CDT agents (and nobody knows who the CDT agents are) and they’re randomly paired up to play one-shot PD, then the CDT agents do better. What does this imply?
Maybe you’d get the same effect if you had 100% UDT agents but with 99% being in blue rooms and 1% being in red rooms. The ones in red rooms would reason that they could defect against the ones in blue rooms because they are in a relevantly different situation due to being in a minority that can easily coordinate defection against the majority. (With the majority still being motivated to cooperate even if they are only correlated with each other.) If so, there’s a sense in which the CDT agents aren’t benefitting anymore than they would if they were UDT agents who got a CDT sticker.
(Note that the red room / blue room thing doesn’t fundamentally break correlations here. Two UDT agents who are playing a symmetric game against each other, when one is in a blue room and one is in a red room, would still be able to cooperate. The thing that breaks the correlation is that the people in the red room are in an easily identifiable minority in a game where a minority can benefit from defection. Which isn’t true in symmetric PD.)
This would raise the question about what’d happen if all the UDT agents were given different serial numbers. Is there some serial numbers that could defect without negative evidence about the other UDT agents?
Hm, in order for this to work in practice, the UDT agents would have to have their serial number or room-color assignment already be present in their prior. If it’s information they receive later-on, they should probably be updateless about it and just cooperate even if they’re in the minority.
- Wei Dai 22 Apr 2026 0:11 UTC
  LW: 3 AF: 2
  0
  AF Parent
  If so, there’s a sense in which the CDT agents aren’t benefitting anymore than they would if they were UDT agents who got a CDT sticker.
  If the situation is reversed, CDT being in the majority and UDT in the minority, CDT would still defect and thus not give UDT an advantage, so CDT seems more “evolutionarily stable” than UDT. This can’t be replicated just by giving UDT agents CDT stickers, since UDT-with-CDT-stickers would cooperate if they’re in the majority.
  - Lukas Finnveden 22 Apr 2026 0:34 UTC
    LW: 5 AF: 3
    0
    AF Parent
    I do think there’s a sense in which CDT behavior is evolutionarily selected for in environments where agents can’t see each others’ decision theories.
    I don’t see this as a big problem with UDT. If UDT wants to be evolutionarily fit relative to other agents in the environment, then I think they could adopt CDT behavior and do just as well as CDT.
    It’s just that, due to the virtue of their decision theory (according to themselves), they have the option of giving up evolutionary fitness in exchange for higher utility in the short run. If they care more about short-run utility than evolutionary fitness, then perhaps they take the deal.
    I don’t think this option is a strike against UDT. In any situation where agents care about X but we’re scoring them on Y, there will be scenarios where their Y-score gets hurt if we give them tools for achieving X which trades off against Y.