Identifiability of the optimal policy seems too strong: it’s basically fine if my household robot doesn’t figure out the optimal schedule for cleaning my house, as long as it’s cleaning it somewhat regularly. But I agree that conceptually we would want something like that.
Identifiability of the optimal policy seems too strong: it’s basically fine if my household robot doesn’t figure out the optimal schedule for cleaning my house, as long as it’s cleaning it somewhat regularly. But I agree that conceptually we would want something like that.