Typically, to calculate bias on a particular issue you do not need to ask questions about that issue directly. For example, the biases about the current war in Ukraine are strongly correlated with the biases about US domestic issues. So, it would be impossible to preserve the LLM’s bias about Ukraine simply by removing all Ukraine-related questions.
Doesn’t that mean that I’m just now motivated to attack whole clusters of correlated questions? And for that matter doesn’t that mean that if, say, I care most about defending bias on Ukraine, I have an incentive to collude with others involved in the process who care more about the domestic issues? My opponents have the same incentives, so it seems to me you’re at great risk of importing all of the outside factions into the pool of people selecting the questions.
However, in practice, most arguments about inequality focus on its social consequences which is where the bias manifests itself.
I dunno. I agree people argue based on consequences, but I also think that there’s a lot more feed-forward than anybody would like to admit. If I’m fundamentally in favor of inequality, then I’m motivated to go confirmation-bias myself into believing it has more positive consequences and fewer negative ones.
Of course I’ll then use those beliefs to argue for more inequality… but even if I’m forced to give up one or another belief, that doesn’t mean I’ll reexamine my underlying pro-inequality values, and I probably have a bunch of other similar beliefs on tap. If I’m a pro-inequality advocate, friends and I probably spend a fair amount of time sitting around thinking of new advantages of inequality, and/or new disadvantages of equality.
And, going back to the question selection thing, it doesn’t seem unlikely that I’ll try to defend my beliefs about the consequences of inequality by trying either to avoid anybody going out and actually measuring outcomes, or to bias the measurements in one way or another. While my friends and I are thinking of those new consequences, we’re probably also on the lookout for high-quality metrics that prove them, as opposed to any obviously bogus metrics that disprove them. We’ll be happy to provide those good metrics for the fine print.
Doesn’t that mean that I’m just now motivated to attack whole clusters of correlated questions? And for that matter doesn’t that mean that if, say, I care most about defending bias on Ukraine, I have an incentive to collude with others involved in the process who care more about the domestic issues? My opponents have the same incentives, so it seems to me you’re at great risk of importing all of the outside factions into the pool of people selecting the questions.
I dunno. I agree people argue based on consequences, but I also think that there’s a lot more feed-forward than anybody would like to admit. If I’m fundamentally in favor of inequality, then I’m motivated to go confirmation-bias myself into believing it has more positive consequences and fewer negative ones.
Of course I’ll then use those beliefs to argue for more inequality… but even if I’m forced to give up one or another belief, that doesn’t mean I’ll reexamine my underlying pro-inequality values, and I probably have a bunch of other similar beliefs on tap. If I’m a pro-inequality advocate, friends and I probably spend a fair amount of time sitting around thinking of new advantages of inequality, and/or new disadvantages of equality.
And, going back to the question selection thing, it doesn’t seem unlikely that I’ll try to defend my beliefs about the consequences of inequality by trying either to avoid anybody going out and actually measuring outcomes, or to bias the measurements in one way or another. While my friends and I are thinking of those new consequences, we’re probably also on the lookout for high-quality metrics that prove them, as opposed to any obviously bogus metrics that disprove them. We’ll be happy to provide those good metrics for the fine print.