In this case “commitment” means something specific.
Suppose you are a selfish CDT agent, and I am considering whether to hire you to clean my house. Once you’re inside my house, you might steal my stuff instead of cleaning my house. Suppose that California Labour Laws require that I pay you up-front and I know I have no chance of getting my money or stuff back. Say your preference order is “Steal” > “Do the job” > “Don’t get hired”
You, before being hired, might say “Oh JB, I promise not to steal, please give me this job” but, once they’re inside the house, the only causal effect they can have on the outcome is steal or don’t steal. And since CDT only considers utilities downstream from each individual decision at any point in time, CDT will always steal. A CDT-operating agent is incapable of committing not to steal from me, in this case.
Therefore I will not hire you to clean my house, and you get minimal utility.
An FDT agent reasons thusly: suppose FDT endorses stealing. In that universe, JB knows this and does not hire me, so I do not get hired and get minimal utility. If FDT endorses doing the job, JB knows this and does hire me, so I do get hired and then do the job. Therefore I will do the job.
Therefore I will hire an FDT agent to clean my house, and the FDT agent will get the middling utility.
You’re stipulating that CDT-me in your thought experiment doesn’t have access to any (psychological) actions that causally bind me to not steal from you. Right? Then sure, CDT-me would steal if he ended up in your house, and you’d want to prevent this.
But you’re also stipulating that I do have access to the action “decide to follow FDT”. That’s something that would causally bind me to not steal from you, if I took it before you made your decision whether to hire me. Why is this action a legitimate option in the hypothetical, while various other non-FDT ways of binding oneself aren’t?
Fair point, and CDT agents self-modifying is a thing that has been studied. You might e.g. modify yourself to specifically not prefer the stealing in this case. My understanding is that these modifications are equivalent to the agent changing its decision theory: since the modifications that an agent chooses are predictable based on the decision properties of the scenarios it finds itself in, they can themselves be captured by a description of a decision procedure, which is just what a decision theory is.
I think the resulting decision theory is called son-of-CDT, and is mostly like FDT but not quite in certain circumstances. But this is deep MIRI knowledge which I’m not sure is actually published and I’m entirely going off of what I’ve seen ex-MIRIans post on LessWrong.
In this case “commitment” means something specific.
Suppose you are a selfish CDT agent, and I am considering whether to hire you to clean my house. Once you’re inside my house, you might steal my stuff instead of cleaning my house. Suppose that California Labour Laws require that I pay you up-front and I know I have no chance of getting my money or stuff back. Say your preference order is “Steal” > “Do the job” > “Don’t get hired”
You, before being hired, might say “Oh JB, I promise not to steal, please give me this job” but, once they’re inside the house, the only causal effect they can have on the outcome is steal or don’t steal. And since CDT only considers utilities downstream from each individual decision at any point in time, CDT will always steal. A CDT-operating agent is incapable of committing not to steal from me, in this case.
Therefore I will not hire you to clean my house, and you get minimal utility.
An FDT agent reasons thusly: suppose FDT endorses stealing. In that universe, JB knows this and does not hire me, so I do not get hired and get minimal utility. If FDT endorses doing the job, JB knows this and does hire me, so I do get hired and then do the job. Therefore I will do the job.
Therefore I will hire an FDT agent to clean my house, and the FDT agent will get the middling utility.
You’re stipulating that CDT-me in your thought experiment doesn’t have access to any (psychological) actions that causally bind me to not steal from you. Right? Then sure, CDT-me would steal if he ended up in your house, and you’d want to prevent this.
But you’re also stipulating that I do have access to the action “decide to follow FDT”. That’s something that would causally bind me to not steal from you, if I took it before you made your decision whether to hire me. Why is this action a legitimate option in the hypothetical, while various other non-FDT ways of binding oneself aren’t?
Fair point, and CDT agents self-modifying is a thing that has been studied. You might e.g. modify yourself to specifically not prefer the stealing in this case. My understanding is that these modifications are equivalent to the agent changing its decision theory: since the modifications that an agent chooses are predictable based on the decision properties of the scenarios it finds itself in, they can themselves be captured by a description of a decision procedure, which is just what a decision theory is.
I think the resulting decision theory is called son-of-CDT, and is mostly like FDT but not quite in certain circumstances. But this is deep MIRI knowledge which I’m not sure is actually published and I’m entirely going off of what I’ve seen ex-MIRIans post on LessWrong.