Under any situation in which I should program the AI to defect against the paperclipper, I can write a simple TDT agent and it will decide to defect against the paperclipper.
So, what is that simple TDT agent? You seemed to have ignored my argument that it can’t exist, but if you can show me the actual agent (and convince me that it would defect against the paperclipper if that’s not obvious) then of course that would trump my arguments.
So, what is that simple TDT agent? You seemed to have ignored my argument that it can’t exist, but if you can show me the actual agent (and convince me that it would defect against the paperclipper if that’s not obvious) then of course that would trump my arguments.
ETA: Never mind, I figured this out myself. See step 11 of http://lesswrong.com/lw/15m/towards_a_new_decision_theory/11lx