A key problem here is that if we use a human as the evaluator, the agent assigns 0 prior probability to the truth: the human won’t be able to update beliefs as a perfect Bayesian, sample a world-state history from his beliefs and assign a value to it according to a utility function. For a Bayesian reason that assigns 0 prior probability to the truth, God only knows how it will behave, even in the limit. (Unless there is some very odd utility function such that the human could be described in this way?)
But maybe this problem could be fixed if the agent takes some more liberties in modeling the evaluator. Maybe once we have a better understanding of bounded approximately-Bayesian reasoning, the agent can model the human as being a bounded reasoner, not a perfectly Bayesian reasoner, which might allow the agent to assign a strictly positive prior to the truth.
And all this said, I don’t think we’re totally clueless when it comes to guessing how this agent would behave, even though a human evaluator would not satisfy the assumptions that the agent makes about him.
A key problem here is that if we use a human as the evaluator, the agent assigns 0 prior probability to the truth: the human won’t be able to update beliefs as a perfect Bayesian, sample a world-state history from his beliefs and assign a value to it according to a utility function. For a Bayesian reason that assigns 0 prior probability to the truth, God only knows how it will behave, even in the limit. (Unless there is some very odd utility function such that the human could be described in this way?)
But maybe this problem could be fixed if the agent takes some more liberties in modeling the evaluator. Maybe once we have a better understanding of bounded approximately-Bayesian reasoning, the agent can model the human as being a bounded reasoner, not a perfectly Bayesian reasoner, which might allow the agent to assign a strictly positive prior to the truth.
And all this said, I don’t think we’re totally clueless when it comes to guessing how this agent would behave, even though a human evaluator would not satisfy the assumptions that the agent makes about him.