I mostly agree with what Paul has said. I probably don’t have a cardinal disagreement with you either; I think creating a theory of values for computationally bounded agents is useful. I just expect that you can’t directly turn your definition into a way of inferring human values robustly; that will require a lot more work. It’s almost certainly an important component to a full solution to value learning, and might also be useful in partial solutions.
I mostly agree with what Paul has said. I probably don’t have a cardinal disagreement with you either; I think creating a theory of values for computationally bounded agents is useful. I just expect that you can’t directly turn your definition into a way of inferring human values robustly; that will require a lot more work. It’s almost certainly an important component to a full solution to value learning, and might also be useful in partial solutions.