This is really handy. I didn’t have much to say, but revisited this recently and figured I’d write down the thoughts I did think.
My general feeling about human models is that they need precisely one more level of indirection than this. Too many levels of indirection, and you get something that correctly predicts the world, but doesn’t contain something you can point to as the desires. Too few, and you end up trying to fit human examples with a model that doesn’t do a good job of fitting human behavior.
For example, if you build your model on responses to survey questions, then what about systematic human difficulties in responding to surveys (e.g. difficulty using a consistent scale across several orders of magnitude of value) that the humans themselves are unaware of? I’d like to use a model of humans that learns about this sort of thing from non-survey-question data.
This is really handy. I didn’t have much to say, but revisited this recently and figured I’d write down the thoughts I did think.
My general feeling about human models is that they need precisely one more level of indirection than this. Too many levels of indirection, and you get something that correctly predicts the world, but doesn’t contain something you can point to as the desires. Too few, and you end up trying to fit human examples with a model that doesn’t do a good job of fitting human behavior.
For example, if you build your model on responses to survey questions, then what about systematic human difficulties in responding to surveys (e.g. difficulty using a consistent scale across several orders of magnitude of value) that the humans themselves are unaware of? I’d like to use a model of humans that learns about this sort of thing from non-survey-question data.