I have a question, though, about the “adversarial predictor” section. My question is: how is world #3 possible? You say:
Agent uses DT1 when rewarded for using DT1 and DT2 when rewarded for using DT2
However, the problem statement said:
Imagine I have a copy of Fiona, and I punish anyone who takes the same action as the copy.
Are we to suppose that the copy of Fiona that the adversarial predictor is running does not know that an adversarial predictor is punishing Fiona for taking certain actions, but that the actual-Fiona does know this, and can thus deviate from what she would otherwise do? If so, then what happens when this assumption is removed—i.e., when we do not inform Fiona that she is being watched (and possibly punished) by an adversarial predictor, or when we do inform copy-Fiona of same?
One would have to ask Eliezer and Nat what they really meant, since it is easy to end up in a self-contradictory setup or to ask a question about an impossible world, like to asking what happens if in the Newcomb’s setup the agent decided to switch to two-boxing after the perfect predictor had already put $1,000,000 in.
My wild guess is that the FDT Fiona from the paper uses a certain decision theory DT1 that does not cope well with the world with adversarial predictors. She uses some kind of causal decision graph logic that would lead her astray instead of being in the winning world. I also assume that Fiona makes her “decisions” while being fully informed about the predictor’s intentions to punish her and just CDT-like throws her hands in the air and cries “unfair!”
Great post!
I have a question, though, about the “adversarial predictor” section. My question is: how is world #3 possible? You say:
However, the problem statement said:
Are we to suppose that the copy of Fiona that the adversarial predictor is running does not know that an adversarial predictor is punishing Fiona for taking certain actions, but that the actual-Fiona does know this, and can thus deviate from what she would otherwise do? If so, then what happens when this assumption is removed—i.e., when we do not inform Fiona that she is being watched (and possibly punished) by an adversarial predictor, or when we do inform copy-Fiona of same?
One would have to ask Eliezer and Nat what they really meant, since it is easy to end up in a self-contradictory setup or to ask a question about an impossible world, like to asking what happens if in the Newcomb’s setup the agent decided to switch to two-boxing after the perfect predictor had already put $1,000,000 in.
My wild guess is that the FDT Fiona from the paper uses a certain decision theory DT1 that does not cope well with the world with adversarial predictors. She uses some kind of causal decision graph logic that would lead her astray instead of being in the winning world. I also assume that Fiona makes her “decisions” while being fully informed about the predictor’s intentions to punish her and just CDT-like throws her hands in the air and cries “unfair!”