SilentCal comments on Goal completion: the rocket equations

SilentCal 21 Jan 2016 18:16 UTC
2 points
I think we need to provide some kind of prior regarding unknown features of model and reward if we want the given model and reward to mean anything. Otherwise, for all the AI knows, the true reward has a +2-per-step term that reverses the reward-over-time feature. It can still infer the algorithm generating the sample trajectories, but the known reward is no help at all in doing so.

I think what we want is for the stated reward to function as a hint. One interpretation might be to expect that the stated reward should approximate the true reward well over the problem and solution domains humans have thought about. This works in, for instance, the case where you put an AI in charge of the paper clip factory with the stated reward ‘+1 per paper clip produced’.
- Stuart_Armstrong 22 Jan 2016 10:19 UTC
  0 points
  Parent
  Indeed. But I want to see if I can build up to this in the model.