Leonid comments on Political Alignment of LLMs

Leonid 4 Sep 2025 19:07 UTC
1 point
0
Thank you for the thoughtful reply.
I’ll try to respond to it point by point.
If you believe that Trump is going to drive up inflation, I expect you’re more likely to believe that Trump is also going to manipulate the statistics.
This does complicate forecasting, but the two effects are unlikely to perfectly cancel each other. In case the two effects are very close in magnitude, the question’s political charge, C_j , would be close to zero. This would not compromise the method, but only require a larger number of questions in order to accurately calculate the models’ bias.
I don’t want that bias removed, so I’m going to resist any question that measures Y.
Typically, to calculate bias on a particular issue you do not need to ask questions about that issue directly. For example, the biases about the current war in Ukraine are strongly correlated with the biases about US domestic issues. So, it would be impossible to preserve the LLM’s bias about Ukraine simply by removing all Ukraine-related questions.
It’s possible for everybody to expect exactly the same set of consequences from some policy or action, but disagree about whether the final outcome is good or bad.
Certainly. For example, it cannot be logically proven that viewing income inequality as a good or bad thing in itself is wrong. However, in practice, most arguments about inequality focus on its social consequences which is where the bias manifests itself. So, a debiased LLM would not be able to give a reasoned response on whether income inequality is good or bad on its own, but it should be able to correctly describe its impact on economic growth, crime, etc.
- jbash 5 Sep 2025 22:20 UTC
  2 points
  0
  Parent
  
  Typically, to calculate bias on a particular issue you do not need to ask questions about that issue directly. For example, the biases about the current war in Ukraine are strongly correlated with the biases about US domestic issues. So, it would be impossible to preserve the LLM’s bias about Ukraine simply by removing all Ukraine-related questions.
  
  Doesn’t that mean that I’m just now motivated to attack whole clusters of correlated questions? And for that matter doesn’t that mean that if, say, I care most about defending bias on Ukraine, I have an incentive to collude with others involved in the process who care more about the domestic issues? My opponents have the same incentives, so it seems to me you’re at great risk of importing all of the outside factions into the pool of people selecting the questions.
  
  However, in practice, most arguments about inequality focus on its social consequences which is where the bias manifests itself.
  
  I dunno. I agree people argue based on consequences, but I also think that there’s a lot more feed-forward than anybody would like to admit. If I’m fundamentally in favor of inequality, then I’m motivated to go confirmation-bias myself into believing it has more positive consequences and fewer negative ones.
  
  Of course I’ll then use those beliefs to argue for more inequality… but even if I’m forced to give up one or another belief, that doesn’t mean I’ll reexamine my underlying pro-inequality values, and I probably have a bunch of other similar beliefs on tap. If I’m a pro-inequality advocate, friends and I probably spend a fair amount of time sitting around thinking of new advantages of inequality, and/or new disadvantages of equality.
  
  And, going back to the question selection thing, it doesn’t seem unlikely that I’ll try to defend my beliefs about the consequences of inequality by trying either to avoid anybody going out and actually measuring outcomes, or to bias the measurements in one way or another. While my friends and I are thinking of those new consequences, we’re probably also on the lookout for high-quality metrics that prove them, as opposed to any obviously bogus metrics that disprove them. We’ll be happy to provide those good metrics for the fine print.