Humans are notoriously bad at explaining real reasons for why we do what we do, so accepting their words as quality feedback seems counterproductive. The feedback need not be ignored, but treated as just another source of information, just like lies and misguided ideas are a source of information about the person expressing them.
Agreed, but the hard question seems to be how you interpret that feedback, given that you can’t interpret it literally.
A reward function would not be anything explicit, but a sort of a Turing test, (Pinocchio test?): fitting in and being implicitly recognized as a fellow human.
Agreed, but the hard question seems to be how you interpret that feedback, given that you can’t interpret it literally.
Fyi, this sounds like imitation learning.