Your explanation about the short-term planner optimizing against the long-term planner seems to suggest we should only see motivated reasoning in cases where there is a short-term reward for it.
It seems to me that motivated reasoning also occurs in cases like gamblers thinking their next lottery ticket has positive expected value, or competitors overestimating their chances of winning a competition, where there doesn’t appear to be a short-term benefit (unless the belief itself somehow counts as a benefit). Do you posit a different mechanism for these cases?
I’ve been thinking for a while that motivated reasoning sort of rhymes with reward hacking, and might arise any time you have a generator-part Goodharting an evaluator-part. Your short-term and long-term planners might be considered one example of this pattern?
I’ve also wondered if children covering their eyes when they get scared might be an example of the same sort of reward hacking (instead of eliminating the danger, they just eliminate the warning signal from the danger-detecting part of themselves by denying it input).
Your explanation about the short-term planner optimizing against the long-term planner seems to suggest we should only see motivated reasoning in cases where there is a short-term reward for it.
It seems to me that motivated reasoning also occurs in cases like gamblers thinking their next lottery ticket has positive expected value, or competitors overestimating their chances of winning a competition, where there doesn’t appear to be a short-term benefit (unless the belief itself somehow counts as a benefit). Do you posit a different mechanism for these cases?
I’ve been thinking for a while that motivated reasoning sort of rhymes with reward hacking, and might arise any time you have a generator-part Goodharting an evaluator-part. Your short-term and long-term planners might be considered one example of this pattern?
I’ve also wondered if children covering their eyes when they get scared might be an example of the same sort of reward hacking (instead of eliminating the danger, they just eliminate the warning signal from the danger-detecting part of themselves by denying it input).