I suspect that reaching into the human concept space is going to be helpful for idealizing human concepts even if we don’t automatically get extrapolated answers in edge cases. Specifically, people have some sort of concepts of what it means for a decision process to be better or worse. For example, if you ask me whether some weirdtopia is good, I might have no idea, but maybe I could say something about how the question might be decided (for example, thinking longer about it is likely to be better than thinking shorter), and there are some unknown abstract principles behind my judgments about decision procedures that would be useful to learn.
By “regularize model selection so that they don’t include edge cases of this sort”, do you mean creating models that are confident about some moral judgments but not about edge cases (or don’t even have edge cases in their domain)? I think something like this is a good idea, and ideally we would want to have high confidence about decision procedures for the things we have low confidence about.
I think it would be really nice to have a mathematical toy model in which “people think X but they think Y decision procedure is good, and Y believes not X” can be expressed, and to see how this relates to concept learning.
I suspect that reaching into the human concept space is going to be helpful for idealizing human concepts even if we don’t automatically get extrapolated answers in edge cases. Specifically, people have some sort of concepts of what it means for a decision process to be better or worse. For example, if you ask me whether some weirdtopia is good, I might have no idea, but maybe I could say something about how the question might be decided (for example, thinking longer about it is likely to be better than thinking shorter), and there are some unknown abstract principles behind my judgments about decision procedures that would be useful to learn.
By “regularize model selection so that they don’t include edge cases of this sort”, do you mean creating models that are confident about some moral judgments but not about edge cases (or don’t even have edge cases in their domain)? I think something like this is a good idea, and ideally we would want to have high confidence about decision procedures for the things we have low confidence about.
I think it would be really nice to have a mathematical toy model in which “people think X but they think Y decision procedure is good, and Y believes not X” can be expressed, and to see how this relates to concept learning.