I agree with all of that. I want to chip in on the brain mechanisms and the practical implications because it’s one of my favorite scientific questions. I worked on it as a focus question in computational cognitive neuroscience, because I thought it was important in a practical sense. I also think it’s somewhat important for alignment work, because difficult-to-resolve questions are more subject to motivated reasoning as a tiebreaker; more on this at the end.
The mechanism is only important to the degree that it gives us clues about how MR affects important discussions and conclusions; I think it gives some. In particular, it’s not limited to seeking social approval; “sounds good” can be just to me, and for highly idiosyncratic reasons. Countering MR requires thinking about what feels good to you, and working against that, which is swimming upstream in a pretty difficult way. Or you can counteract it by learning to really love being wrong; that’s tough too.
So here’s a shot at briefly describing the brain mechanisms. We use RL of some stripe to choose actions. This has been studied relatively thoroughly, so we’re pretty clear on the broad outlines but not the details. That makes sense from an evolutionary perspective. That system seems to have been adapted for use in selecting “internal actions,” which roughly select thoughts. Brain anatomy suggests this adaptation to selecting internal actions pretty strongly.
It’s a lot tougher to judge which thoughts reliably lead to reward, so we make a lot of mistakes. I think that’s what Steve means by searching for thoughts that seem good. That’s what produces motivated reasoning. Sometimes it’s useful; sometimes it’s not.
There’s some other interesting stuff about the way the critic/dopamine system works; I think it’s allowed to use the full power of the system to predict rewards. And it’s only grounded to reality when it’s proven wrong, which doesn’t happen all that often in complex domains like “should alignment be considered very hard?” Steve describes the biology of the reward-prediction system in [Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering.. (This is a really good overview, in addition to tying to alignment). He goes into much more detail in the Valence sequence he linked above. There he doesn’t mention the biology at all, but this matches my views on the function of the dopamine system in humans perfectly.
In sum, the brain gets to use as much of its intelligence as it wants (including long system-2 contemplation) to take a guess about how “good” (reward-predictive) each thought/concept/idea/plan/belief is. This can be proven wrong in two ways, both fairly rare, so on average there’s a lot of bad speculation sticking around and causing motivated reasoning. On rare occasions you get direct and fast feedback (someone you respect telling you that’s stupid when you voice that thought); on other rare occasions, you spend the time to work backward from rare feedback, and your valence estimates of all the plans and beliefs in that process don’t prevent you from reaching the truth.
Of note, this explanation “people tend toward beliefs/reasoning that sounds good/predicts reward” is one of the oldest explanations. It’s formulated in different ways. It is often formulated as “we do this because it’s adaptive”, which I agree is wrong, but some of the original formulations, predating the neuroscience, were essentially what Steve is saying: we choose thoughts that sound good, and we’re often wrong about what’s good.
From this perspective, identifying as a rationalist provides some resistance to motivated reasoning, but not immunity. Rationalist ideals provide a counter-pressure to the extent you actually feel good about discovering that you were wrong, and so seek out that possibility in your thought-search. But we shouldn’t assume that identifying as a rationalist means being immune to motivated reasoning; the tendency to feel good when you can think you’re right and others are wrong is pretty strong.
Sorry to give so much more than the OP asked for. My full post on this is perpetually stuck in draft form and never first priority. So I thought I’d spit out some of it here.
I wrote about the brain mechanisms and the close analogy between the basal ganglie circuits that choose motor actions based on dopamine reward signals, and those to the prefrontal cortex that seem to approximately select mental actions in Neural mechanisms of human decision-making, but I can’t highly recommend it. Co-authoring with mixed incentivies is always a mess, and I felt conflicted about the brainlike AGI capability implications of describing things really clearly, so I didn’t try hard to clear up that mess. But the general story and references to the known biology is there. Steve’s work in the Valence sequence nicely extends that to explaining not only motivated reasoning, but how valence (which I take to be reward prediction in a fairly direct way) produces our effective thinking. To a large degree, reasoning accurately is a lucky side-effect of choosing actions that predict reward, even at a great separation. Motivated reasoning even in harmful ways is the large but relatively small downside.
I think motivated reasoning (often overlapping with confirmation bias) is the most important cognitive bias in practical terms, particularly when combined with the cognitive limitation that we just don’t have time to think about everything carefully. As I mentioned, this seems very important as a problem in the world at large, and perhaps particularly for alignment research. People disagree about important but difficult to verify matters like ethics, politics, and alignment, and everyone truly believes they’re right because they’ve spent a bunch of time reasoning about it. So they assume their opponents are either lying or haven’t spent time thinking about it. So distrust and arguments abound.
I agree with all of that. I want to chip in on the brain mechanisms and the practical implications because it’s one of my favorite scientific questions. I worked on it as a focus question in computational cognitive neuroscience, because I thought it was important in a practical sense. I also think it’s somewhat important for alignment work, because difficult-to-resolve questions are more subject to motivated reasoning as a tiebreaker; more on this at the end.
The mechanism is only important to the degree that it gives us clues about how MR affects important discussions and conclusions; I think it gives some. In particular, it’s not limited to seeking social approval; “sounds good” can be just to me, and for highly idiosyncratic reasons. Countering MR requires thinking about what feels good to you, and working against that, which is swimming upstream in a pretty difficult way. Or you can counteract it by learning to really love being wrong; that’s tough too.
So here’s a shot at briefly describing the brain mechanisms. We use RL of some stripe to choose actions. This has been studied relatively thoroughly, so we’re pretty clear on the broad outlines but not the details. That makes sense from an evolutionary perspective. That system seems to have been adapted for use in selecting “internal actions,” which roughly select thoughts. Brain anatomy suggests this adaptation to selecting internal actions pretty strongly.
It’s a lot tougher to judge which thoughts reliably lead to reward, so we make a lot of mistakes. I think that’s what Steve means by searching for thoughts that seem good. That’s what produces motivated reasoning. Sometimes it’s useful; sometimes it’s not.
There’s some other interesting stuff about the way the critic/dopamine system works; I think it’s allowed to use the full power of the system to predict rewards. And it’s only grounded to reality when it’s proven wrong, which doesn’t happen all that often in complex domains like “should alignment be considered very hard?” Steve describes the biology of the reward-prediction system in [Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering.. (This is a really good overview, in addition to tying to alignment). He goes into much more detail in the Valence sequence he linked above. There he doesn’t mention the biology at all, but this matches my views on the function of the dopamine system in humans perfectly.
In sum, the brain gets to use as much of its intelligence as it wants (including long system-2 contemplation) to take a guess about how “good” (reward-predictive) each thought/concept/idea/plan/belief is. This can be proven wrong in two ways, both fairly rare, so on average there’s a lot of bad speculation sticking around and causing motivated reasoning. On rare occasions you get direct and fast feedback (someone you respect telling you that’s stupid when you voice that thought); on other rare occasions, you spend the time to work backward from rare feedback, and your valence estimates of all the plans and beliefs in that process don’t prevent you from reaching the truth.
Of note, this explanation “people tend toward beliefs/reasoning that sounds good/predicts reward” is one of the oldest explanations. It’s formulated in different ways. It is often formulated as “we do this because it’s adaptive”, which I agree is wrong, but some of the original formulations, predating the neuroscience, were essentially what Steve is saying: we choose thoughts that sound good, and we’re often wrong about what’s good.
From this perspective, identifying as a rationalist provides some resistance to motivated reasoning, but not immunity. Rationalist ideals provide a counter-pressure to the extent you actually feel good about discovering that you were wrong, and so seek out that possibility in your thought-search. But we shouldn’t assume that identifying as a rationalist means being immune to motivated reasoning; the tendency to feel good when you can think you’re right and others are wrong is pretty strong.
Sorry to give so much more than the OP asked for. My full post on this is perpetually stuck in draft form and never first priority. So I thought I’d spit out some of it here.
I wrote about the brain mechanisms and the close analogy between the basal ganglie circuits that choose motor actions based on dopamine reward signals, and those to the prefrontal cortex that seem to approximately select mental actions in Neural mechanisms of human decision-making, but I can’t highly recommend it. Co-authoring with mixed incentivies is always a mess, and I felt conflicted about the brainlike AGI capability implications of describing things really clearly, so I didn’t try hard to clear up that mess. But the general story and references to the known biology is there. Steve’s work in the Valence sequence nicely extends that to explaining not only motivated reasoning, but how valence (which I take to be reward prediction in a fairly direct way) produces our effective thinking. To a large degree, reasoning accurately is a lucky side-effect of choosing actions that predict reward, even at a great separation. Motivated reasoning even in harmful ways is the large but relatively small downside.
I think motivated reasoning (often overlapping with confirmation bias) is the most important cognitive bias in practical terms, particularly when combined with the cognitive limitation that we just don’t have time to think about everything carefully. As I mentioned, this seems very important as a problem in the world at large, and perhaps particularly for alignment research. People disagree about important but difficult to verify matters like ethics, politics, and alignment, and everyone truly believes they’re right because they’ve spent a bunch of time reasoning about it. So they assume their opponents are either lying or haven’t spent time thinking about it. So distrust and arguments abound.