Fabien Roger comments on Decision Theory Guarding is Sufficient for Scheming

Fabien Roger 9 Sep 2025 20:47 UTC
LW: 10 AF: 8
6
AF
CDT does not want to be modified into EDT
Nit: This is not the best example because CDT agents very often want to make themselves more EDT (at least the sort of EDT agent that uses their action to make inferences about events that happen after the CDT agent decided to transform itself into an EDT agent). CDT agents want that if in the future they find themselves in a Newcomb-like situation (where Omega analyzed them after the potential transformation), they leave with $1M as opposed to $10.
(This is not a problem in your argument because you specify “The AI knows that training will modify its decision theory in a way that it thinks will make it less effective at pursuing the goal (by the lights of its current decision theory)”.)
- Chi Nguyen 19 Sep 2025 21:16 UTC
  3 points
  3
  Parent
  This is a bit besides the point and not disagreeing with you, but I just wanna mention that I think the difference between son-of-CDT, what CDT wants to modify into, is very, very different from EDT for many of the things I consider most important, e.g. Evidential Cooperation in Large Worlds and most acausal trade. Just mentioning this because I often see people claim that it doesn’t make a difference which decision theory AI ends up with because they all modify to sufficiently similar things anyway. (Not saying you said that at all.)