Political Alignment of LLMs

TLDR: Constructing an unbiased LLM presents the challenge of determining what constitutes an objective viewpoint. Here I propose a forecasting-based technique for solving this problem. All feedback, particularly from people with expertise in AI alignment and LLM training, would be highly appreciated.

Applying Forecasting to Bias Measurement

First, a short background story to boost my credentials:

Six years ago, IARPA conducted an experiment involving five hundred forecasters making probabilistic predictions on three hundred geopolitical issues. The predictions were aggregated via a wisdom-of-crowds algorithm which assigned each forecaster a weight proportional to their past accuracy. Simultaneously, IARPA launched a public competition promising a $250,000 reward for improving its algorithm’s accuracy by at least 20%.

One of the most effective techniques that helped me win this contest was accounting for forecasters’ biases in addition to their general accuracy. For example, on politically charged questions, forecasters tend to make systematic errors that reflect their political preferences.

These bias-driven errors can be modeled using the interaction of two vectors:

Error = Cj · Bi

Where:

  • Bi represents the bias vector of forecaster i across multiple political dimensions

  • Cj represents the political charge vector of question j

For example, a politically neutral question (“Will it rain in Paris tomorrow?”) would have a near-zero Cj, leading to minimal bias-related error. In contrast, a politically loaded question (“Will inflation rise under the Trump administration?”) would have high absolute Cj values, indicating strong prediction divergence between left- and right-leaning forecasters.

Given a sufficient history of prediction errors, Singular Value Decomposition or Matrix Factorization can be used to infer each forecaster’s bias vector. Once these biases are known, their effect (Cj · Bi) can be subtracted from all forecasts — including those on questions that haven’t yet been resolved.

Debiasing Large Language Models

A similar approach may be adapted for measuring political bias in LLMs:

  1. Prompt different LLMs—or the same model under different sampling seeds, fine-tunings, or RLHF configurations—to make verifiable predictions on a set of politically controversial questions.

  2. Once the forecast questions have resolved, use their outcomes to estimate each model’s political bias vector.

To remove political bias from existing models, we can use a four-step method:

  1. Prompt different LLMs—or the same model under different sampling seeds, fine-tunings, or RLHF configurations—to make verifiable predictions on a set of politically controversial questions.

  2. Ask these same models to rate the quality and political bias of various political content items (e.g., news articles, opinion pieces, or LLM responses to political queries).

  3. Once the forecast questions have resolved, use their outcomes to estimate each model’s political bias vector. Then, subtract the bias effects (Cj · Bi) from their content ratings.

  4. Train a bias scorer using corrected ratings found in the previous step. Use it to guide the LLM via preference optimization or RL with a bias penalty.

I am very interested in your opinions on this approach. Particularly, I would like to know:

  • Do you see any plausible reasons it might fail?

  • Do you know anyone who might be interested in testing it?

  • Do you expect a significant demand for unbiased LLMs, or would people overwhelmingly prefer models that share their own biases?