Some hobby Bayesianism. A typical challenge for a rationalist is that there is some claim X to be evaluated, it seems preposterous, but many people believe it. How should you take account of this when considering how likely X is to be true? I’m going to propose a mathematical model of this situation and discuss two of it’s features.
This is based on a continuing discussion with Unknowns, who I think disagrees with what I’m going to present, or with its relevance to the “typical challenge.”
Summary: If you learn that a preposterous hypothesis X is believed by many people, you should not correct your prior probability P(X) by a factor larger than the reciprocal of P(Y), your prior probability for the hypothesis Y = “X is believed by many people.” One can deduce an estimate of P(Y) from an estimate of the quantity “if I already knew that at least n people believed X, how likely it would be that n+1 people believed X” as a function of n. It is not clear how useful this method of estimating P(Y) is.
The right way to unpack “X seems preposterous, but many believe it” mathematically is as follows. We have a very low prior probability P(X), and then we have new evidence Y = “many people believe X”. The problem is to evaluate P(X|Y).
One way to phrase the typical challenge is “How much larger than P(X) should P(X|Y) be?” In other words, how large is the ratio P(X|Y)/P(X)? Bayes formula immediately says something interesting about this:
P(X|Y)/P(X) = P(Y|X)/P(Y)
Moreover, since P(Y|X) < 1, the right-hand side of that equation is less than 1/P(Y). My interpretation of this: if you want to know how seriously to take the fact that many people believe something, you should consider how likely you find it that many people would believe it absent any evidence. Or a little more precisely, how likely you find it that many people would believe it if the amount of evidence available to them was unknown to you. You should not correct your prior for X by more than the reciprocal of this probability.
Comment: how much less than 1 P(Y|X) is depends on the nature of X. For instance, if X is the claim “the Riemann hypothesis is false” then it is unclear to me how to estimate P(Y|X), but (since it is conceivable to me that RH is false, but still it is widely believed) it might be quite small. If X is an everyday claim like “it’s a full moon tomorrow”, or a spectacular claim like “Jesus rose from the dead”, it seems like P(Y|X) is very close to 1. So sometimes 1/P(Y) is a good approximation to P(X|Y)/P(X), but maybe sometimes it is a big overestimation.
What about P(Y)? Is there a way to estimate it, or at least approach its estimation? Let’s give ourselves a little more to work with, by quantifying “many people” in “many people believe X”. Let Y(n) be the assertion “at least n people believe X.” Note that this model doesn’t specify what “believe” means—in particular it does not specify how strongly n people believe X, nor how smart or expert those n people are, nor where in the world they are located… if there is a serious weakness in this model it might be found here.
Another application of Bayes theorem gives us
P(Y(n+1))/P(Y(n)) = P(Y(n+1)|Y(n))
(Since P(Y(n)|Y(n+1)) = 1, i.e. if we know n+1 people believe X, then of course n people believe X). Squinting a little, this gives us a formula for the derivative of the logarithm of P(Y(n)). Yudkowsky has suggested naming the log of a probability an “absurdity,” let’s write A(Y(n)) for the absurdity of Y(n).
d/dn A(Y(n)) = A(Y(n+1)|Y(n))
So up to an additive constant A(Y(n)) is the integral from 1 to n of A(Y(m+1)|Y(m))dm. So an ansatz for P(Y(n+1)|Y(n)) = exp(A(Y(n+1)|Y(n)) will allow us to say something about P(Y(n)), up to a multiplicative constant.
The shape of P(Y(n+1)|Y(n)) seems like it could have a lot to do with what kind of statement X is, but there is one thing that seems likely to be true no matter what X is: if N is the total population of the world and n/N is close to zero, then P(Y(n+1)|Y(n)) is also close to zero, and if n/N is close to one then P(Y(n+1)|Y(n)) is also close to one. I might work out an example ansatz like this in a future comment, if this one stands up to scrutiny.
Here is my proposal for an ansatz for P(Y(n+1)|Y(n)). That is, given that at least n people already believe X, how likely it is that at least one more person also believes X. Let N be the total population of the world. If n/N is close to zero, then I expect P(Y(n+1)|Y(n)) is also close to zero, and if n/N is close to 1, then P(Y(n+1)|Y(n)) is also close to 1. That is, if I know that a tiny proportion of people believe something, that’s very weak evidence that a slightly larger proportion believe it also, and if I know that almost everyone believes it, that’s very strong evidence that even more people believe it.
One family of functions that have this property are the functions f(n) = (n/N)^C, where C is some fixed positive number. Actually it’s convenient to set C = c/N where c is some other fixed positive number. I don’t have a story to tell about why P(Y(n+1)|Y(n)) should behave this way, I bring it up only because f(n) does the right thing near 1 and N, and is pretty simple.
To evaluate P(Y(n)), we take the integral of
(c/N)log(t/N)dt
from 1 to n, and exponentiate it. The result is, up to a multiplicative constant
exp(c times (x log x—x)) = (x/e)^(cx)
where x = n/N. I think it’s a good idea to leave this as a function of x. Write K for the multiplicative constant. We have P(Proportion x of the population believes X) = K(x/e)^(cx). A graph of this function for K = 1, c = 1 can be found here and a graph of its reciprocal (whose relevance is explained in the parent) can be found here
It’s an interesting analysis—have you confirmed the appearance of that distribution with real-world data? I suppose you’d need a substantial body of factual claims about which statistical information is available...
Thanks. I of course have no data, although I think there are lots of surveys done about weird things people believe. But even if this is the correct distribution, I think it would be difficult to fit data to it, because I would guess/worry that the constants K and c would depend on the nature of the claim. (c is so far just an artifact of the ansatz. K is something like P(Y(1)|Y(0)). Different for bigfoot than for Christianity.) Do you have any ideas?
Some hobby Bayesianism. A typical challenge for a rationalist is that there is some claim X to be evaluated, it seems preposterous, but many people believe it. How should you take account of this when considering how likely X is to be true? I’m going to propose a mathematical model of this situation and discuss two of it’s features.
This is based on a continuing discussion with Unknowns, who I think disagrees with what I’m going to present, or with its relevance to the “typical challenge.”
Summary: If you learn that a preposterous hypothesis X is believed by many people, you should not correct your prior probability P(X) by a factor larger than the reciprocal of P(Y), your prior probability for the hypothesis Y = “X is believed by many people.” One can deduce an estimate of P(Y) from an estimate of the quantity “if I already knew that at least n people believed X, how likely it would be that n+1 people believed X” as a function of n. It is not clear how useful this method of estimating P(Y) is.
The right way to unpack “X seems preposterous, but many believe it” mathematically is as follows. We have a very low prior probability P(X), and then we have new evidence Y = “many people believe X”. The problem is to evaluate P(X|Y).
One way to phrase the typical challenge is “How much larger than P(X) should P(X|Y) be?” In other words, how large is the ratio P(X|Y)/P(X)? Bayes formula immediately says something interesting about this:
P(X|Y)/P(X) = P(Y|X)/P(Y)
Moreover, since P(Y|X) < 1, the right-hand side of that equation is less than 1/P(Y). My interpretation of this: if you want to know how seriously to take the fact that many people believe something, you should consider how likely you find it that many people would believe it absent any evidence. Or a little more precisely, how likely you find it that many people would believe it if the amount of evidence available to them was unknown to you. You should not correct your prior for X by more than the reciprocal of this probability.
Comment: how much less than 1 P(Y|X) is depends on the nature of X. For instance, if X is the claim “the Riemann hypothesis is false” then it is unclear to me how to estimate P(Y|X), but (since it is conceivable to me that RH is false, but still it is widely believed) it might be quite small. If X is an everyday claim like “it’s a full moon tomorrow”, or a spectacular claim like “Jesus rose from the dead”, it seems like P(Y|X) is very close to 1. So sometimes 1/P(Y) is a good approximation to P(X|Y)/P(X), but maybe sometimes it is a big overestimation.
What about P(Y)? Is there a way to estimate it, or at least approach its estimation? Let’s give ourselves a little more to work with, by quantifying “many people” in “many people believe X”. Let Y(n) be the assertion “at least n people believe X.” Note that this model doesn’t specify what “believe” means—in particular it does not specify how strongly n people believe X, nor how smart or expert those n people are, nor where in the world they are located… if there is a serious weakness in this model it might be found here.
Another application of Bayes theorem gives us
P(Y(n+1))/P(Y(n)) = P(Y(n+1)|Y(n))
(Since P(Y(n)|Y(n+1)) = 1, i.e. if we know n+1 people believe X, then of course n people believe X). Squinting a little, this gives us a formula for the derivative of the logarithm of P(Y(n)). Yudkowsky has suggested naming the log of a probability an “absurdity,” let’s write A(Y(n)) for the absurdity of Y(n).
d/dn A(Y(n)) = A(Y(n+1)|Y(n))
So up to an additive constant A(Y(n)) is the integral from 1 to n of A(Y(m+1)|Y(m))dm. So an ansatz for P(Y(n+1)|Y(n)) = exp(A(Y(n+1)|Y(n)) will allow us to say something about P(Y(n)), up to a multiplicative constant.
The shape of P(Y(n+1)|Y(n)) seems like it could have a lot to do with what kind of statement X is, but there is one thing that seems likely to be true no matter what X is: if N is the total population of the world and n/N is close to zero, then P(Y(n+1)|Y(n)) is also close to zero, and if n/N is close to one then P(Y(n+1)|Y(n)) is also close to one. I might work out an example ansatz like this in a future comment, if this one stands up to scrutiny.
Here is my proposal for an ansatz for P(Y(n+1)|Y(n)). That is, given that at least n people already believe X, how likely it is that at least one more person also believes X. Let N be the total population of the world. If n/N is close to zero, then I expect P(Y(n+1)|Y(n)) is also close to zero, and if n/N is close to 1, then P(Y(n+1)|Y(n)) is also close to 1. That is, if I know that a tiny proportion of people believe something, that’s very weak evidence that a slightly larger proportion believe it also, and if I know that almost everyone believes it, that’s very strong evidence that even more people believe it.
One family of functions that have this property are the functions f(n) = (n/N)^C, where C is some fixed positive number. Actually it’s convenient to set C = c/N where c is some other fixed positive number. I don’t have a story to tell about why P(Y(n+1)|Y(n)) should behave this way, I bring it up only because f(n) does the right thing near 1 and N, and is pretty simple.
To evaluate P(Y(n)), we take the integral of
(c/N)log(t/N)dt
from 1 to n, and exponentiate it. The result is, up to a multiplicative constant
exp(c times (x log x—x)) = (x/e)^(cx)
where x = n/N. I think it’s a good idea to leave this as a function of x. Write K for the multiplicative constant. We have P(Proportion x of the population believes X) = K(x/e)^(cx). A graph of this function for K = 1, c = 1 can be found here and a graph of its reciprocal (whose relevance is explained in the parent) can be found here
It’s an interesting analysis—have you confirmed the appearance of that distribution with real-world data? I suppose you’d need a substantial body of factual claims about which statistical information is available...
Thanks. I of course have no data, although I think there are lots of surveys done about weird things people believe. But even if this is the correct distribution, I think it would be difficult to fit data to it, because I would guess/worry that the constants K and c would depend on the nature of the claim. (c is so far just an artifact of the ansatz. K is something like P(Y(1)|Y(0)). Different for bigfoot than for Christianity.) Do you have any ideas?