I think we can improve the setup, by conditioning only on things the AI has not control over.
Imagine that every turn, there is a random bit B sent along a wire. It’s almost certainly 1, but there’s a tiny chance that it’s 0.
If the button is pressed and B=0, then the agent’s utility doesn’t update. But if the button is pressed and B=1, the utility is updated as usual.
Except the expression is conditioned not on “press”, but on “B=0″.
Now we just have to define B in such a way that the AI can’t affect it—we need a defined source of true randomness.
Even better: B has already been calculated by some defined past process, the AI just doesn’t know what it is yet.
I think we can improve the setup, by conditioning only on things the AI has not control over.
Imagine that every turn, there is a random bit B sent along a wire. It’s almost certainly 1, but there’s a tiny chance that it’s 0.
If the button is pressed and B=0, then the agent’s utility doesn’t update. But if the button is pressed and B=1, the utility is updated as usual.
Except the expression is conditioned not on “press”, but on “B=0″.
Now we just have to define B in such a way that the AI can’t affect it—we need a defined source of true randomness.
Even better: B has already been calculated by some defined past process, the AI just doesn’t know what it is yet.