‘a’ should use a randomizing device so that he pays 51% of the time and refuses 49% of the time. Omega, aware of this strategy, but presumably unable to hack the randomizing device, achieves the best score by predicting ‘pay’ 100% of the time.
I am making an assumption here about Omega’s cost function—i.e. that Type 1 and Type 2 errors are equally undesirable. So, I agree with cousin_it that the problem is underspecified.
The constraint P(o=AWARD) = P(a=PAY) that appears in the diagram does not seem to match the problem statement. It is also ambiguous. Are those subjective probabilities? If so, which agent forms those probabilities? And, as cousin_it points out, we also need to know the joint probability P(o=REWARD&a=PAY) or a conditional probability P(o=REWARD | a=PAY)
‘a’ should use a randomizing device so that he pays 51% of the time and refuses 49% of the time. Omega, aware of this strategy, but presumably unable to hack the randomizing device, achieves the best score by predicting ‘pay’ 100% of the time.
Apply any of the standard fine print for Omega based conterfactuals with respect for people who try to game the system with randomization. Depending on the version that means a payoff of $0, a payoff of 0.51 * $1,000 or an outright punishment for being a nuisance.
I prefer this interpretation: P(a=X) means how sure the agent is it will X. If it flips a coin do decide whether X or Y, P(a=X)=P(a=Y)~=0.5. If it’s chosen to “just X”, P(a=X) ~= 1. Omega for his part knows the agent’s surety and uses a randomizing device to match his actions with it.
ETA: if interpreted naively, this leads to Omega rewarding agents with deluded beliefs about what they’re going to do. Maybe Omega shouldn’t look at the agent’s surety but the surety of “a perfectly rational agent” in the same situation. I don’t have a real solution to this right now.
‘a’ should use a randomizing device so that he pays 51% of the time and refuses 49% of the time. Omega, aware of this strategy, but presumably unable to hack the randomizing device, achieves the best score by predicting ‘pay’ 100% of the time.
I am making an assumption here about Omega’s cost function—i.e. that Type 1 and Type 2 errors are equally undesirable. So, I agree with cousin_it that the problem is underspecified.
The constraint P(o=AWARD) = P(a=PAY) that appears in the diagram does not seem to match the problem statement. It is also ambiguous. Are those subjective probabilities? If so, which agent forms those probabilities? And, as cousin_it points out, we also need to know the joint probability P(o=REWARD&a=PAY) or a conditional probability P(o=REWARD | a=PAY)
Apply any of the standard fine print for Omega based conterfactuals with respect for people who try to game the system with randomization. Depending on the version that means a payoff of $0, a payoff of 0.51 * $1,000 or an outright punishment for being a nuisance.
I prefer this interpretation: P(a=X) means how sure the agent is it will X. If it flips a coin do decide whether X or Y, P(a=X)=P(a=Y)~=0.5. If it’s chosen to “just X”, P(a=X) ~= 1. Omega for his part knows the agent’s surety and uses a randomizing device to match his actions with it.
ETA: if interpreted naively, this leads to Omega rewarding agents with deluded beliefs about what they’re going to do. Maybe Omega shouldn’t look at the agent’s surety but the surety of “a perfectly rational agent” in the same situation. I don’t have a real solution to this right now.