Thanks for this and its companion post; I found the two posts very interesting, and I think they’ll usefully inform some future work for me.
A few thoughts came to mind as I read, some of which can sort-of be seen as pushing back against some claims, but in ways that I think aren’t very important and that I expect you’ve already thought about. I’ll split these into separate comments.
Firstly, as you note, what you’re measuring is how well predictions match a proxy for the truth (the proxy being Elizabeth’s judgement), rather than the truth itself. Something I think you don’t explicitly mention is that:
Elizabeth’s judgement may be biased in some way (rather than just randomly erring), and
The network-based forecasters’ judgements may be biased in a similar way, and therefore
This may “explain away” part of the apparent value of the network-based forecasters’ predictions, along with part of their apparent superiority over the online crowdworkers’ predictions.
E.g., perhaps EA-/rationality-adjacent people are biased towards disagreeing with “conventional wisdom” on certain topics, and this bias is somewhat shared between Elizabeth and the network-based forecasters. (I’m not saying this is actually the case; it’s just an example.)
You make a somewhat similar point in the Part 2 post, when you say that the online crowdworkers:
were operating under a number of disadvantages relative to other participants, which means we should be careful when interpreting their performance. [For example, the online crowdworkers] did not know that Elizabeth was the researcher who created the claims and would resolve them, and so they had less information to model the person whose judgments would ultimately decide the questions.
But that is about participants’ ability to successfully focus on predicting what Elizabeth will say, rather than their ability to accidentally be biased in the same way as Elizabeth when both are trying to make judgements about the ground truth.
In any case, I don’t think this matters much. One reason is that this “shared bias” issue probably at most “explains away” a relatively small fraction of the apparent value of the network-adjacent forecasters’ predictions, probably without tipping the balance of whether this sort of set-up is worthwhile. Another reason is that there may be ways to mitigate this “shared bias” issue.
I think I’m very nervous about trying to get at “Truth”. I definitely don’t mean to claim that we were confident that this work gets us much closer to truth; more that it can help progress a path of deliberation. The expectation is that it can get us closer to the truth than most other methods, but we’ll still be several steps away.
I imagine that there are many correlated mistakes society is making. It’s really difficult to escape that. I’d love for future research to make attempts here, but I suspect it’s a gigantic challenge, both for research and social reasons. For example, in ancient Egypt, I believe it would have taken some intense deliberation to both realize that the popular religion was false, and also to be allowed to say such.
Thanks for this and its companion post; I found the two posts very interesting, and I think they’ll usefully inform some future work for me.
A few thoughts came to mind as I read, some of which can sort-of be seen as pushing back against some claims, but in ways that I think aren’t very important and that I expect you’ve already thought about. I’ll split these into separate comments.
Firstly, as you note, what you’re measuring is how well predictions match a proxy for the truth (the proxy being Elizabeth’s judgement), rather than the truth itself. Something I think you don’t explicitly mention is that:
Elizabeth’s judgement may be biased in some way (rather than just randomly erring), and
The network-based forecasters’ judgements may be biased in a similar way, and therefore
This may “explain away” part of the apparent value of the network-based forecasters’ predictions, along with part of their apparent superiority over the online crowdworkers’ predictions.
E.g., perhaps EA-/rationality-adjacent people are biased towards disagreeing with “conventional wisdom” on certain topics, and this bias is somewhat shared between Elizabeth and the network-based forecasters. (I’m not saying this is actually the case; it’s just an example.)
You make a somewhat similar point in the Part 2 post, when you say that the online crowdworkers:
But that is about participants’ ability to successfully focus on predicting what Elizabeth will say, rather than their ability to accidentally be biased in the same way as Elizabeth when both are trying to make judgements about the ground truth.
In any case, I don’t think this matters much. One reason is that this “shared bias” issue probably at most “explains away” a relatively small fraction of the apparent value of the network-adjacent forecasters’ predictions, probably without tipping the balance of whether this sort of set-up is worthwhile. Another reason is that there may be ways to mitigate this “shared bias” issue.
Thanks for the attention on this point.
I think I’m very nervous about trying to get at “Truth”. I definitely don’t mean to claim that we were confident that this work gets us much closer to truth; more that it can help progress a path of deliberation. The expectation is that it can get us closer to the truth than most other methods, but we’ll still be several steps away.
I imagine that there are many correlated mistakes society is making. It’s really difficult to escape that. I’d love for future research to make attempts here, but I suspect it’s a gigantic challenge, both for research and social reasons. For example, in ancient Egypt, I believe it would have taken some intense deliberation to both realize that the popular religion was false, and also to be allowed to say such.