Ah yes, the trivial model of humans that says “whatever they do, that’s what they want.”
That’s a different type of trivial solution, but the idea is broadly similar—because the space of reasonable planners is so large, there’s lots of ways to slice up the policy into reward + bias. (As Paul puts it, the easy reward inference problem is still hard.)
Is there some argument similar to a type signature argument that would rule out such poorly-generalizing approaches, though?
I don’t think a type signature would help: the problem is that the class of planners/biases and the class of rewards are way too unconstrained. We probably need to just better understand what human planners are via cognitive science (or do a better translation between AI concepts and Human concepts).
That being said, there’s been a paper that I’ve put off writing for two years now, on why the Armstrong/Mindermann NFL theorem is kind of silly and how to get around it, maybe I will write it at some point :)
That’s a different type of trivial solution, but the idea is broadly similar—because the space of reasonable planners is so large, there’s lots of ways to slice up the policy into reward + bias. (As Paul puts it, the easy reward inference problem is still hard.)
I don’t think a type signature would help: the problem is that the class of planners/biases and the class of rewards are way too unconstrained. We probably need to just better understand what human planners are via cognitive science (or do a better translation between AI concepts and Human concepts).
That being said, there’s been a paper that I’ve put off writing for two years now, on why the Armstrong/Mindermann NFL theorem is kind of silly and how to get around it, maybe I will write it at some point :)