Joe_Collman comments on Answering questions honestly instead of predicting human answers: lots of problems and some solutions

Joe_Collman 19 Jul 2021 19:15 UTC
LW: 3 AF: 3
AF
Having thought about it more (hopefully with more clarity), I think I have trouble imagining training data for $f^{+}$ that:
- We’re highly confident is correct.
- Enables the model to decide which true things to output in general. (my (2) here)
It seems to me that we can be highly confident about matters of fact (how many chairs are in this room...), but less confident once value judgements come into play (which of A or B is the better answer to “How should I go about designing a chair?”).
[Of course it’s not black-and-white: one can make a philosophical argument that all questions are values questions. However, I think this is an issue even if we stick to pragmatic, common-sense approaches.]
I don’t think we can remedy this for values questions by including only data that we’re certain of. It seems to me that works for facts questions due to the structure of the world: it’s so hugely constrained by physical law that you can get an extremely good model by generalizing from sparse data from a different distribution.
It’s not clear that anything analogous works for generalizing preferences (maybe?? but I’d guess not). I’d expect an $f^{+}$ trained on [data we’re highly confident is correct] to generalize poorly to general open questions.
Similarly, in Paul’s setup I think the following condition will fail if we need to be highly confident of the correctness (relative to what is known) of the small dataset:
- The small dataset is still rich enough that you could infer correct language usage from it, i.e. the consistency condition on the small dataset alone suffices to recover all 10,000 bits required to specify the intended model.
It’s entirely plausible you can learn “correct language usage” in the narrow sense from consistency on the small dataset (i.e. you may infer a [deduced_statement → natural_language_equivalent] mapping). I don’t think it’s plausible you learn it in the sense required (i.e. a [(set_of_all_deduced_statements, Q) → natural_language_answer] mapping).
Again, perhaps I’m (not even) wrong, but I think the above accurately describes my current thinking.