michaelcohen comments on [missing post]

michaelcohen 9 May 2019 0:45 UTC
LW: 1 AF: 1
AF
It looks closer to the Value Learning Agent in that paper to me and maybe can be considered an implementation / specific instance of that?
Yes. What the value learning agent doesn’t specify is what constitutes observational evidence of the utility function, or in this notation, how to calculate $P_{s_{0}, prior, u}^{π}$ and thereby calculate $w (u | h_{< t})$ . So this construction makes a choice about how to specify how the true utility function becomes manifest in the agent’s observations. A number of simpler choices don’t seem to work.