I think this describes the fabric of thought in a very important way. I’ve been applying this lens to evaluating my own thinking, and others discussions/arguments, for years now, and it seems to fit very well.
I think the biggest application to alignment discourse is in applying this principle to understand and reduce our own biases.
I think rationalists are resistant to confirmation bias (arguably by far the most destructive bias), because rationalists place a high valence on changing their minds. We will seek evidence that contradicts our current beliefs because we actually like changing our minds.
But this doesn’t make us immune to confirmation bias or the closely related motivated reasoning. We still dislike being proven wrong, particularly in public. And we have valence attached to concepts in a way that distorts our thinking.
I think there’s a strong halo/horns effect for concepts as well as for people. If you like the orthogonality thesis as a concept, you probably like instrumental convergence as well, because those concepts tend to co-occur on the same side of an argument. (That liking is closely related to but not the same as thinking a thesis is true.)
If we have a low valance for “alignment optimism”, we’ll tend to assign low valence to claims associated with it. Similarly for “alignment pessimism”. This effect over time creates polarization between viewpoints, and a slide toward antagonism rather than good-faith exploration of evidence and arguments.
I think this describes the fabric of thought in a very important way. I’ve been applying this lens to evaluating my own thinking, and others discussions/arguments, for years now, and it seems to fit very well.
I think the biggest application to alignment discourse is in applying this principle to understand and reduce our own biases.
I think rationalists are resistant to confirmation bias (arguably by far the most destructive bias), because rationalists place a high valence on changing their minds. We will seek evidence that contradicts our current beliefs because we actually like changing our minds.
But this doesn’t make us immune to confirmation bias or the closely related motivated reasoning. We still dislike being proven wrong, particularly in public. And we have valence attached to concepts in a way that distorts our thinking.
I think there’s a strong halo/horns effect for concepts as well as for people. If you like the orthogonality thesis as a concept, you probably like instrumental convergence as well, because those concepts tend to co-occur on the same side of an argument. (That liking is closely related to but not the same as thinking a thesis is true.)
If we have a low valance for “alignment optimism”, we’ll tend to assign low valence to claims associated with it. Similarly for “alignment pessimism”. This effect over time creates polarization between viewpoints, and a slide toward antagonism rather than good-faith exploration of evidence and arguments.