zulupineapple comments on Oracle paper

zulupineapple 13 Dec 2017 21:42 UTC
1 point
0
Suppose I’m the counterfactual oracle. To every question I answer with K. Eventually Alice reads K, no matter how frequent E is. Then I get maximal reward. Am I missing something? Is the paper assuming that the oracle is incapable of long term planning?
- Stuart_Armstrong 14 Dec 2017 12:12 UTC
  2 points
  0
  Parent
  It assumes the reward is episodic, so it assumes that an Oracle maximising that has no interest in the long term. Also, if Alice is to read K, the episode ends before she does so. Only in situations where Alice does not read K is the episode extended until the answer is known.
  - zulupineapple 14 Dec 2017 13:41 UTC
    1 point
    0
    Parent
    But why is that a reasonable assumption to make? Aren’t you just assuming that the AI will play nice? I can see that there are some dangerous Oracles that we can protect from using your strategy, but there are also many that it wouldn’t hinder at all.
    - Stuart_Armstrong 15 Dec 2017 20:34 UTC
      2 points
      0
      Parent
      >Aren’t you just assuming that the AI will play nice?
      I’m assuming that the reward/utility functions that can be defined to be episodic. We hand the Oracle its utility, hence we can (in theory) construct it to be episodic (and train the Oracle in an episodic way, if we need to train it).