Charlie Steiner comments on Learning biases and rewards simultaneously

Charlie Steiner 8 Jul 2019 22:18 UTC
LW: 2 AF: 2
0
AF
I like this example of “works in practice but not in theory.” Would you associate “ambitious value learning vs. adequate value learning” with “works in theory vs. doesn’t work in theory but works in practice”?
One way that “almost rational” is much closer to optimal than “almost anti-anti-rational” is ye olde dot product, but a more accurate description of this case would involve dividing up the model space into basins of attraction. Different training procedures will divide up the space in different ways—this is actually sort of the reverse of a monte carlo simulation where one of the properties you might look for is ergodicity (eventually visiting all points in the space).
- Rohin Shah 9 Jul 2019 3:40 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Would you associate “ambitious value learning vs. adequate value learning” with “works in theory vs. doesn’t work in theory but works in practice”?
  Potentially. I think the main question is whether adequate value learning will work in practice.