I first heard about the False Confidence Theorem (FCT) a number of years ago, although at the time I did not understand why it was meaningful. I later returned to it, and the second time around, with a little more experience (and finding a more useful exposition), its importance was much easier to grasp. I now believe that this result is incredibly central to the use of Bayesian reasoning in a wide range of practical contexts, and yet seems to not be very well known (I was not able to find any mention of it on LessWrong). I think it is at the heart of some common confusions, where seemingly strong Bayesian arguments feel intuitively wrong, but for reasons that are difficult to articulate well. For example, I think it is possibly the central error that Rootclaim made in their lab-leak argument, and although the judges were able to come to the correct conclusion, the fact that seemingly no one was able to specifically nail down this issue has left the surrounding discussion muddled in uncertainty. I hope to help resolve both this and other confusions.
Satellite conjunction
The best exposition of the FCT that I have found is “Satellite conjunction analysis and the false confidence theorem.” The motivating example here is the problem of predicting when satellites are likely to collide with each other, necessitating avoidance maneuvers. The paper starts by walking through a seemingly straightforward application of Bayesian statistics to compute an epistemic probability that 2 satellites will collide, given data (including uncertainty) about their current position and motion. At the end, we notice that very large uncertainties in the trajectories correspond to a very low epistemic belief of collision. Not uncertainty, but rather high confidence of safety. As the paper puts it:
…it makes sense that as uncertainty grows, the risk of collision also grows. Epistemic probability of collision eventually hits a maximum, and past that maximum, as relative uncertainty rises, the epistemic probability of collision decreases. This decrease is called probability dilution, and it has an odd implication. Since the uncertainty in the estimates of the trajectories reflects the limits of data quality, probability dilution seems to imply that lowering data quality makes satellites safer. That implication is counterintuitive in the extreme [4–8]. As a rule, lowering the data quality makes any engineering system less safe, and to claim that ignorance somehow reduces collision risk seems foolish on its face.
And yet, from a Bayesian perspective, we might argue that this makes sense. If we have 2 satellites that look like they are on a collision course (point estimate of the minimum distance between them is 0), but those estimates are highly uncertain, we might say that the trajectories are close to random. And in that case, 2 random trajectories gives you a low collision probability. But reasoning this way simply based on uncertainty is an error. You certainly should not become more confident that 2 satellites are safe, just because you added random noise to the measurements.
As it turns out, this problem pops up in a very wide variety of contexts. The paper proves that any epistemic belief system will assign arbitrarily high probability to propositions that are false, with arbitrarily high frequentist probability. Indeed:
There is a fixed proposition of practical interest that is guaranteed or nearly guaranteed to be assigned a large epistemic probability, regardless of whether or not it is true… What the false confidence theorem shows is that, in most practical inference problems, there is no theoretical limit on how severely false confidence will manifest itself in an epistemic probability distribution, or more precisely, there is no such limit that holds for all measurable propositions.
Moreover, there is no easy way around this result. It applies to any “epistemic belief system”, i.e. any system of assigning probabilities to statements that includes the seemingly basic law of probability that P(A) = 1 - P(not A). This occurs because of this very fact: If we cannot assign a high probability to A, we must assign substantial probability to not-A. In this case, if cannot be more than, say, 0.1% sure the satellites will collide, then we have to be at least 99.9% sure that they will not collide.
However, there is one way out (well, one way that preserves the probability rule above). This result is restricted to epistemic uncertainty, that is, uncertainty resulting from an agent’s lack of knowledge, in contrast to aleatory variability, that is, actual randomness in the behavior of the object being studied. A Bayesian might object vehemently to this distinction, but recall the motivating example. If 2 satellites are on a collision course, adding noise to the measurements of their trajectories does not make them safer. However, giving each one a random push from its jets increases the actual variation in their paths, likely pushing them away from the previous point estimate of a collision, and thus does make them safer.
The practical take-away
It is inappropriate to conflate subjective uncertainty with actual variation when reasoning under uncertainty. Doing so can result in errors of arbitrary magnitude. This phenomenon can occur, for example, when a key estimate relies on a highly uncertain parameter. Saying, “I don’t know much about this subject, but it would be overconfident to say this probability is less than 10%” sounds safe and prudent. But your lack of knowledge does not actually constrain the true value. It could in reality be 1⁄100, or 1⁄10,000, or 1⁄1,000,000. This arbitrarily severe error can then be carried forward, for example if the probability in question is used to compute a Bayes factor; both it and the final answer will then be off by the same (possibly very high) ratio.
Perhaps an alternative way of phrasing this fact is simply to say that uncertainty is not evidence. Bayes theorem tells you how to incorporate evidence into your beliefs. You can certainly incorporate uncertainty into your beliefs, but you can’t treat them the same way.
Example 1: Other people’s (lack of) confidence
Back in the day, Scott Alexander asked the following question in reference to the claim that the probability of Amanda Knox’s guilt is on the order 1 in 1,000, when LW commenters had given an average of 35%:
Out of one thousand criminal trials in which the Less Wrong conventional wisdom gave the defendant a 35% chance of being guilty, you would expect to be able to correctly determine guilt nine hundred ninety nine times?
In fact, komponsito was entirely correct to be confident. 35% did not represent a true evaluation of AK’s probability of guilt, based on all of the available evidence. Many commenters, by their own admission, had not thoroughly investigated the case. 35% simply represented their epistemic uncertainty on a topic they had not investigated. If every commenter had thoroughly researched the case and the resulting average was still 35%, one could ask if komponsito was being overconfident, but as it stood, the commenters’ average and his number represented entirely different things and it would be rather meaningless to compare them.
One may as well survey the community to ask whether a coin would come up heads or tails, and then after I flip it and proclaim it definitely came up heads, you accuse me of being overconfident. After all, a hundred rationalists claimed it was 50/50! (Or to take a slightly less silly example, a coin that is known to be biased, but I’m the only one who’s researched how biased or in what direction).
Example 2: Heroic Bayesian analysis
In Rootclaim’s most recent COVID origins analysis, the single strongest piece of evidence is “12 nucleotides clean insertion,” which they claim is 20x more likely in lab leak (after out-of-model correction). Specifically, they say it is 10% likely under lab leak, based on the following “guesstimate:”
In the past, FCSs have been added by substitution rather than insertion, but it is not difficult to do it by insertion. We cannot be sure of the exact considerations of the lab researchers who added the FCS , such as investigating the role of proline. Therefore, we assign it a 10% probability.
So, they do not have any evidence that, across all cases when researchers might try to add an FCS to a virus, they use a “12 nucleotide clean insertion” 1 time out of 10. They simply provide a guess, based on their own lack of knowledge. This is exactly the error described above: For all they actually know, the true frequency of this behavior could be 1⁄1,000, an error of 100x, or it could be even worse.
It is simply not valid to claim strong evidence for no other reason than your own certainty. Doing so is perverse to the extreme, and would make it trivial to make yourself completely confident by ignoring as much evidence as possible. The only valid conclusion to draw from this lack of knowledge is that you are unable to evaluate the evidence in question, and must remain uncertain.
So what should you do instead?
I believe that, essentially, avoiding FCT (at least, when epistemic uncertainty is unvaoidable) comes down to explicitly including uncertainty in your final probability estimate. The satellite conjunction paper offers a solution which bounds the probability of collision, and which can be proven to actually achieve this desired safety level. The key fact is that we are not claiming an exact value for P(collision) or its complement. The example from the satellite paper is based on “confidence regions,” i.e.
…a confidence region represents the simple assertion that we are 1 − α confident that the true value of θ is somewhere inside Γα(x). Any sets containing Γα(x) inherit that confidence; all other sets accrue no positive confidence… for any false proposition, i.e. any set A such that A∌θ, the probability that said proposition will be assigned a confidence of at least 1 − α is less than or equal to α
For the specific satellite case, the solution is to compute uncertainty ellipsoids for each object, and check if they overlap at the point of closest approach. In this case, the probability of collision can indeed be limited:
so long as one performs a manoeuvre whenever the two uncertainty ellipsoids intersect, the rate at which collisions occur over a large number of conjunctions—i.e. the operational aleatory probability of collision—will be capped at α′ = 2α.
These tools are in some sense, “crude” ways of representing belief, as they do not reflect the full richness of the axioms of probability theory. And yet, they may be of great practical use.
Conclusion
It is perhaps quite surprising that attempting to force your beliefs to respect the seemingly obvious law of probability that P(A) = 1-P(A) can result in errors. Not just that, but it is in fact guaranteed to result in errors that are arbitrarily bad. Moreover, contrary to what “pure” or “naive” Bayesianism might indicate, there is in fact a very significant, practical difference between subjective uncertainty and aleatory variability. Nevertheless, the results seem to be on very solid mathematical ground, and once we dive into what these results are really saying, it makes a lot more intuitive sense.
The false confidence theorem and Bayesian reasoning
A little background
I first heard about the False Confidence Theorem (FCT) a number of years ago, although at the time I did not understand why it was meaningful. I later returned to it, and the second time around, with a little more experience (and finding a more useful exposition), its importance was much easier to grasp. I now believe that this result is incredibly central to the use of Bayesian reasoning in a wide range of practical contexts, and yet seems to not be very well known (I was not able to find any mention of it on LessWrong). I think it is at the heart of some common confusions, where seemingly strong Bayesian arguments feel intuitively wrong, but for reasons that are difficult to articulate well. For example, I think it is possibly the central error that Rootclaim made in their lab-leak argument, and although the judges were able to come to the correct conclusion, the fact that seemingly no one was able to specifically nail down this issue has left the surrounding discussion muddled in uncertainty. I hope to help resolve both this and other confusions.
Satellite conjunction
The best exposition of the FCT that I have found is “Satellite conjunction analysis and the false confidence theorem.” The motivating example here is the problem of predicting when satellites are likely to collide with each other, necessitating avoidance maneuvers. The paper starts by walking through a seemingly straightforward application of Bayesian statistics to compute an epistemic probability that 2 satellites will collide, given data (including uncertainty) about their current position and motion. At the end, we notice that very large uncertainties in the trajectories correspond to a very low epistemic belief of collision. Not uncertainty, but rather high confidence of safety. As the paper puts it:
And yet, from a Bayesian perspective, we might argue that this makes sense. If we have 2 satellites that look like they are on a collision course (point estimate of the minimum distance between them is 0), but those estimates are highly uncertain, we might say that the trajectories are close to random. And in that case, 2 random trajectories gives you a low collision probability. But reasoning this way simply based on uncertainty is an error. You certainly should not become more confident that 2 satellites are safe, just because you added random noise to the measurements.
As it turns out, this problem pops up in a very wide variety of contexts. The paper proves that any epistemic belief system will assign arbitrarily high probability to propositions that are false, with arbitrarily high frequentist probability. Indeed:
Moreover, there is no easy way around this result. It applies to any “epistemic belief system”, i.e. any system of assigning probabilities to statements that includes the seemingly basic law of probability that P(A) = 1 - P(not A). This occurs because of this very fact: If we cannot assign a high probability to A, we must assign substantial probability to not-A. In this case, if cannot be more than, say, 0.1% sure the satellites will collide, then we have to be at least 99.9% sure that they will not collide.
However, there is one way out (well, one way that preserves the probability rule above). This result is restricted to epistemic uncertainty, that is, uncertainty resulting from an agent’s lack of knowledge, in contrast to aleatory variability, that is, actual randomness in the behavior of the object being studied. A Bayesian might object vehemently to this distinction, but recall the motivating example. If 2 satellites are on a collision course, adding noise to the measurements of their trajectories does not make them safer. However, giving each one a random push from its jets increases the actual variation in their paths, likely pushing them away from the previous point estimate of a collision, and thus does make them safer.
The practical take-away
It is inappropriate to conflate subjective uncertainty with actual variation when reasoning under uncertainty. Doing so can result in errors of arbitrary magnitude. This phenomenon can occur, for example, when a key estimate relies on a highly uncertain parameter. Saying, “I don’t know much about this subject, but it would be overconfident to say this probability is less than 10%” sounds safe and prudent. But your lack of knowledge does not actually constrain the true value. It could in reality be 1⁄100, or 1⁄10,000, or 1⁄1,000,000. This arbitrarily severe error can then be carried forward, for example if the probability in question is used to compute a Bayes factor; both it and the final answer will then be off by the same (possibly very high) ratio.
Perhaps an alternative way of phrasing this fact is simply to say that uncertainty is not evidence. Bayes theorem tells you how to incorporate evidence into your beliefs. You can certainly incorporate uncertainty into your beliefs, but you can’t treat them the same way.
Example 1: Other people’s (lack of) confidence
Back in the day, Scott Alexander asked the following question in reference to the claim that the probability of Amanda Knox’s guilt is on the order 1 in 1,000, when LW commenters had given an average of 35%:
In fact, komponsito was entirely correct to be confident. 35% did not represent a true evaluation of AK’s probability of guilt, based on all of the available evidence. Many commenters, by their own admission, had not thoroughly investigated the case. 35% simply represented their epistemic uncertainty on a topic they had not investigated. If every commenter had thoroughly researched the case and the resulting average was still 35%, one could ask if komponsito was being overconfident, but as it stood, the commenters’ average and his number represented entirely different things and it would be rather meaningless to compare them.
One may as well survey the community to ask whether a coin would come up heads or tails, and then after I flip it and proclaim it definitely came up heads, you accuse me of being overconfident. After all, a hundred rationalists claimed it was 50/50! (Or to take a slightly less silly example, a coin that is known to be biased, but I’m the only one who’s researched how biased or in what direction).
Example 2: Heroic Bayesian analysis
In Rootclaim’s most recent COVID origins analysis, the single strongest piece of evidence is “12 nucleotides clean insertion,” which they claim is 20x more likely in lab leak (after out-of-model correction). Specifically, they say it is 10% likely under lab leak, based on the following “guesstimate:”
So, they do not have any evidence that, across all cases when researchers might try to add an FCS to a virus, they use a “12 nucleotide clean insertion” 1 time out of 10. They simply provide a guess, based on their own lack of knowledge. This is exactly the error described above: For all they actually know, the true frequency of this behavior could be 1⁄1,000, an error of 100x, or it could be even worse.
It is simply not valid to claim strong evidence for no other reason than your own certainty. Doing so is perverse to the extreme, and would make it trivial to make yourself completely confident by ignoring as much evidence as possible. The only valid conclusion to draw from this lack of knowledge is that you are unable to evaluate the evidence in question, and must remain uncertain.
So what should you do instead?
I believe that, essentially, avoiding FCT (at least, when epistemic uncertainty is unvaoidable) comes down to explicitly including uncertainty in your final probability estimate. The satellite conjunction paper offers a solution which bounds the probability of collision, and which can be proven to actually achieve this desired safety level. The key fact is that we are not claiming an exact value for P(collision) or its complement. The example from the satellite paper is based on “confidence regions,” i.e.
For the specific satellite case, the solution is to compute uncertainty ellipsoids for each object, and check if they overlap at the point of closest approach. In this case, the probability of collision can indeed be limited:
These tools are in some sense, “crude” ways of representing belief, as they do not reflect the full richness of the axioms of probability theory. And yet, they may be of great practical use.
Conclusion
It is perhaps quite surprising that attempting to force your beliefs to respect the seemingly obvious law of probability that P(A) = 1-P(A) can result in errors. Not just that, but it is in fact guaranteed to result in errors that are arbitrarily bad. Moreover, contrary to what “pure” or “naive” Bayesianism might indicate, there is in fact a very significant, practical difference between subjective uncertainty and aleatory variability. Nevertheless, the results seem to be on very solid mathematical ground, and once we dive into what these results are really saying, it makes a lot more intuitive sense.
Additional links
https://en.wikipedia.org/wiki/False_confidence_theorem
https://arxiv.org/abs/1807.06217