One key question is where this argument fails—because as noted, superforecasters are often very good, and most of the time, listing failure modes or listing what you need is effective.
I think the answer is adversarial domains. That is, when there is a explicit pressure to find other alternatives. The obvious place this happens is when you’re actually facing a motivated opponent—like the scenario of AI trying to kill people, or cybersecurity intrusions. That’s because by construction, the blocked examples don’t contain much probability mass, since the opponent is actually blocked, and picks other routes. When there’s an argument, the selection of arguments and the goal of the arguer is often motivated beforehand, and will pick other “routes” in the argument—and really good arguers will take advantage of this, as noted. And this is somewhat different from the Fatima Sun Miracle, where the selection pressure for proofs of God was to find examples of something they couldn’t explain, and then use that, rather than selection on the arguments themselves.
In contrast, what Rethink did for theories of consciousness seems to be different—there’s a priori no reason to think that most probability mass lies outside of what we think about, since how consciousness works is not understood, but is not adversarial. And moving away from the point of the post, the conclusion should be that we know we’re wrong, because we haven’t dissolved the question, but we can try our theories since they seem likely to be at least near the correct explanation, even if we haven’t found it yet. And using heuristics, “just read the behavioural observations on different animals and go off of vibes” rather than theories, when you don’t have correct theories, is a reasonable move, but also a completely different discussion!
As an aside, the formalisms that deal with this properly are not Bayesian, they are nonrealizable settings. See Diffractor and Vanessa’s work, like this: https://arxiv.org/abs/2504.06820v2
Also, my experience with actual super forecasters, as opposed to people who forecast in EA spaces, has been that this failure mode is quite common, and problematic, even outside of existential risk—for example, things during COVID, especially early on.