RyanCarey comments on How to judge moral learning failure

RyanCarey 9 Dec 2016 1:01 UTC
0 points
AF

I’m thinking of modelling this as classical moral uncertainty over plausive value/reward functions in a set R={Ri}, but assuming that the probability of a given Ri is never assumed to go below a certain probability.

It’s surprising to me that you would want your probabilities of each reward function to not approach zero, even asymptotically. In regular bandit problems, if your selection of some action never asymptotes toward zero, then you will necessarily keep making some kinds of mistakes forever, incurring linear regret. The same should be true for some suitable definition of regret if you stubbornly continue to behave according to some “wrong” moral theory.
- Stuart_Armstrong 12 Dec 2016 16:10 UTC
  0 points
  AF Parent
  But I’m arguing that using these moral theories to assess regret, is the wrong thing to do.