The main problem here is generalization to a different distribution over problems. I don’t know what the best way to handle this is, but one idea is that you can also use similar techniques for detecting differences between the training and test sets. Specifically, if you have a number of hypotheses for what the test distribution P is, then you can do null hypothesis significance testing to determine that the test P is different from the training P if you have enough data, and you could probably use this to bound the error.
Roughly, we want something like this:
We have some set of possible P distributions
We get training data from some unknown distribution Ptrain, and observe the test problems (but not their answers, of course) from some unknown distribution Ptest
We test the null hypothesis that Ptrain=Ptest against the alternative hypothesis that Ptrain≠Ptest
If the test says they’re different, alert the human and ask them to label the new data
If the test fails to show that they’re different, then the distributions are in fact not very different, so (I think) the old uniform convergence guarantees should mostly still apply
And depending on our α value for NHST, we probably don’t alert the human very often when Ptrain=Ptest (business as usual)
I haven’t worked out the math, but it seems like there should be something useful here.
Cool, this seems similar to some stuff I’ve been thinking about lately. If I get the meaning of this right, then it’s essentially uniform convergence guarantees in statistical learning theory applied to moral problems, right?
The main problem here is generalization to a different distribution over problems. I don’t know what the best way to handle this is, but one idea is that you can also use similar techniques for detecting differences between the training and test sets. Specifically, if you have a number of hypotheses for what the test distribution P is, then you can do null hypothesis significance testing to determine that the test P is different from the training P if you have enough data, and you could probably use this to bound the error.
Roughly, we want something like this:
We have some set of possible P distributions
We get training data from some unknown distribution Ptrain, and observe the test problems (but not their answers, of course) from some unknown distribution Ptest
We test the null hypothesis that Ptrain=Ptest against the alternative hypothesis that Ptrain≠Ptest
If the test says they’re different, alert the human and ask them to label the new data
If the test fails to show that they’re different, then the distributions are in fact not very different, so (I think) the old uniform convergence guarantees should mostly still apply
And depending on our α value for NHST, we probably don’t alert the human very often when Ptrain=Ptest (business as usual)
I haven’t worked out the math, but it seems like there should be something useful here.