Suppose I’m the counterfactual oracle. To every question I answer with K. Eventually Alice reads K, no matter how frequent E is. Then I get maximal reward. Am I missing something? Is the paper assuming that the oracle is incapable of long term planning?
It assumes the reward is episodic, so it assumes that an Oracle maximising that has no interest in the long term. Also, if Alice is to read K, the episode ends before she does so. Only in situations where Alice does not read K is the episode extended until the answer is known.
But why is that a reasonable assumption to make? Aren’t you just assuming that the AI will play nice? I can see that there are some dangerous Oracles that we can protect from using your strategy, but there are also many that it wouldn’t hinder at all.
>Aren’t you just assuming that the AI will play nice?
I’m assuming that the reward/utility functions that can be defined to be episodic. We hand the Oracle its utility, hence we can (in theory) construct it to be episodic (and train the Oracle in an episodic way, if we need to train it).
Suppose I’m the counterfactual oracle. To every question I answer with K. Eventually Alice reads K, no matter how frequent E is. Then I get maximal reward. Am I missing something? Is the paper assuming that the oracle is incapable of long term planning?
It assumes the reward is episodic, so it assumes that an Oracle maximising that has no interest in the long term. Also, if Alice is to read K, the episode ends before she does so. Only in situations where Alice does not read K is the episode extended until the answer is known.
But why is that a reasonable assumption to make? Aren’t you just assuming that the AI will play nice? I can see that there are some dangerous Oracles that we can protect from using your strategy, but there are also many that it wouldn’t hinder at all.
>Aren’t you just assuming that the AI will play nice?
I’m assuming that the reward/utility functions that can be defined to be episodic. We hand the Oracle its utility, hence we can (in theory) construct it to be episodic (and train the Oracle in an episodic way, if we need to train it).