Deciding which predictions were accurate can indeed become an issue. However, most of the time this does not become a problem unless the resolution criteria are ambiguously defined. During forecasting tournaments, forecasters working on any question are expected to adjust their predictions according to the question’s fine print (such as the source of the inflation data that will be used for the question’s resolution).
Regarding politics affecting the selection of questions — can you explain why this would be a problem?
Thank you for the thoughtful reply.
I’ll try to respond to it point by point.
This does complicate forecasting, but the two effects are unlikely to perfectly cancel each other. In case the two effects are very close in magnitude, the question’s political charge, Cj , would be close to zero. This would not compromise the method, but only require a larger number of questions in order to accurately calculate the models’ bias.
Typically, to calculate bias on a particular issue you do not need to ask questions about that issue directly. For example, the biases about the current war in Ukraine are strongly correlated with the biases about US domestic issues. So, it would be impossible to preserve the LLM’s bias about Ukraine simply by removing all Ukraine-related questions.
Certainly. For example, it cannot be logically proven that viewing income inequality as a good or bad thing in itself is wrong. However, in practice, most arguments about inequality focus on its social consequences which is where the bias manifests itself. So, a debiased LLM would not be able to give a reasoned response on whether income inequality is good or bad on its own, but it should be able to correctly describe its impact on economic growth, crime, etc.