Suppose I notice I am a human on Earth in America. I consider two hypotheses. One is that everything is as it seems. The other is that there is a vast conspiracy to hide the fact that America is much bigger than I think—it actually contains one trillion trillion people. It seems like SIA should prefer the conspiracy theory (if the conspiracy is too implausible, just increase the posited number of people until it cancels out).

I am often confused by the kind of reasoning at play in the text I bolded. Maybe someone can help sort me out. As I increase the number of people in the conspiracy world, my prior in that world also decreases. If my prior falls faster than the number of people in the considered world grows, I will not be able to construct a conspiracy-world that allows the thought experiment to bite.

Consider the situation where I arrive at the airport, where I will wait in line at security. Wouldn’t I be more likely to discover a line 1000 people long than 100 people long? I am 10x more likely to exist in the longer line. The problem is that our prior on 1000 people security lines might be very low. The reasoning on display in the above passage would invite us to simply crank up the length of the line, say, to 1 million people. I suspect that SIA proponents don’t show up at the airport expecting lines this long. Why? Because the prior on a million-person line is more than a thousand times lower than the prior on a 100-person line.

This also applies to some presentations of Pascal’s mugging.

There’s no principle that says that prior probability of a population exceeding some size N must decrease more quickly than 1/N asymptotically, or any other property of some system. Some priors will have this property, some won’t.

My prior for real-world security lines does have this property, though this cheats a little by being largely founded in real-world experience already. Does my prior for population of hypothetical worlds involving Truman Show style conspiracies (or worse!) have this property? I don’t know—maybe not?

Does it even make sense to have a prior over these? After all a prior still requires some sort of model that you can use to expect things or not, and I have no reasonable models at all for such worlds. A mathematical “universal” prior like Solomonoff is useless since it’s theoretically uncomputable, and also in a more practical sense utterly disconnected from the domain of properties such as “America’s population”.

On the whole though, your point is quite correct that for many priors you can’t “integrate the extreme tails” to get a significant effect. The tails of some priors are just too thin.

While you’re quite right about numbers on the scale of billions or trillions, I don’t think it makes sense in the limit for the prior probability of X people existing in the world to fall faster than X grows in size.

Certain series of large numbers grow larger much faster than they grow in complexity. A program that returns 10^(10^(10^10)) takes fewer bits to specify (relative to most reasonable systems of specifying programs) than a program that returns 32758932523657923658936180532035892630581608956901628906849561908236520958326051861018956109328631298061259863298326379326013327851098368965026592086190862390125670192358031278018273063587236832763053870032004364702101004310417647840155719238569120561329853619283561298215693286953190539832693826325980569123856910536312892639082369382562039635910965389032698312569023865938615338298392306583192365981036198536932862390326919328369856390218365991836501590931685390659103658916392090356835906398269120625190856983206532903618936398561980569325698312650389253839527983752938579283589237325987329382571092301928* - even though 10^(10^(10^10)) is by far the larger number. And it only takes a linear increase in complexity to make it 10^(10^(10^(10^(10^(10^10))))) instead.

*I produced this number via keyboard-mashing; it’s not anything special.

Consider the proposition “A superpowered entity capable of creating unlimited numbers of people ran a program that output the result of a random program out of all possible programs (with their outputs rendered as integers), weighted by the complexity of those programs, and then created that many people.”

If this happened, the probability that their program outputs at least X would fall much slower than X rises, in the limit. The sum doesn’t converge at all; the expected number of people created would be literally infinite.

So as long as you assign greater than literally zero probability to that proposition—and there’s no such thing as zero probability—there must exist some number X such that you assign greater than 1/X probability to X people existing. In fact, there must exist some number X such that you assign greater than 1/X probability to X million people existing, or X billion, or so on.

(btw, I don’t think that the sort of SIA-based reasoning here is actually valid—but if it was, then yeah, it implies that there are infinite people.)

I think when you get to any class of hypotheses like “capable of creating unlimited numbers of people” with nonzero probability, you run into multiple paradoxes of infinity.

For example, there is no uniform distribution over any countable set, which includes the set of all halting programs. Every non-uniform distribution this hypothetical superbeing may have used over such programs is a different prior hypothesis. The set of these has no suitable uniform distribution either, since they can be partitioned into countably many equivalence classes under natural transformations.

It doesn’t take much study of this before you’re digging into pathologies of measure theory such as Vitali sets and similar.

You can of course arbitrarily pick any of these weightings to be your “chosen” prior, but that’s just equivalent to choosing a prior over population directly so it doesn’t help at all.

Probability theory can’t adequately deal with such hypothesis families, and so if you’re considering Bayesian reasoning you must discard them from your prior distribution. Perhaps there is some extension or replacement for probability that can handle them, but we don’t have one.

I just came across this word from John Koenig’s Dictionary of Obscure Sorrows, that nicely capture the thesis of All Debates Are Bravery Debates.

redesisn. a feeling of queasiness while offering someone advice, knowing they might well face a totally different set of constraints and capabilities, any of which might propel them to a wildly different outcome—which makes you wonder if all of your hard-earned wisdom’s fundamentally nonstraferable, like handing someone a gift card in your name that probably expired years ago.

I would really like to see a post from someone in AI policy on “Grading Possible Comprehensive AI Legislation.” The post would lay out what kind of safety stipulations would earn a bill an “A-” vs a “B+”, for example.

I’m imagining a situation where, in the next couple years, a big omnibus AI bill gets passed that contains some safety-relevant components. I don’t want to be left wondering “did the safety lobby get everything it asked for, or did it get shafted?” and trying to construct an answer ex-post.

Today I am thankful that Bayes’ Rule is unintuitive.

Much ink has been spilled complaining that Bayes’ Rule can yield surprising results. As anyone who has taken an introductory statistics class knows, it is difficult to solve a problem that requires an application of Bayes’ Rule without plugging values into the formula, at least for a beginner. Eventually, the student of Bayes may gain an intuition for the Rule (perhaps in odds form), but at that point they can be trusted to wield their intuition responsibly because it was won through disciplined practice.

This unintuitiveness is a feature, not a bug because it discourages motivated reasoning. If Bayes’ Rule were more intuitive, it would be simple to back out what P(A), P(B), and P(B|A) must be to justify your preferred posterior belief, and then argue for these quantities. It would also be simple to work backwards to select your prediction A from a favorable hypothesis space. Because Bayes’ Rule is unintuitive, these are challenging moves, and formally updating your beliefs is less vulnerable to motivated reasoning.

Scott Alexander says:

I am often confused by the kind of reasoning at play in the text I bolded. Maybe someone can help sort me out. As I increase the number of people in the conspiracy world, my prior in that world also decreases. If my prior falls faster than the number of people in the considered world grows, I will not be able to construct a conspiracy-world that allows the thought experiment to bite.

Consider the situation where I arrive at the airport, where I will wait in line at security. Wouldn’t I be more likely to discover a line 1000 people long than 100 people long? I am 10x more likely to exist in the longer line. The problem is that our prior on 1000 people security lines might be very low. The reasoning on display in the above passage would invite us to simply crank up the length of the line, say, to 1 million people. I suspect that SIA proponents don’t show up at the airport expecting lines this long. Why? Because the prior on a million-person line is more than a thousand times lower than the prior on a 100-person line.

This also applies to some presentations of Pascal’s mugging.

This point was recently elaborated on here: Pascal’s Mugging and the Order of Quantification

There’s no

principlethat says that prior probability of a population exceeding some size N must decrease more quickly than 1/N asymptotically, or any other property of some system. Some priors will have this property, some won’t.My prior for real-world security lines does have this property, though this cheats a little by being largely founded in real-world experience already. Does my prior for population of hypothetical worlds involving Truman Show style conspiracies (or worse!) have this property? I don’t know—maybe not?

Does it even make sense to have a prior over these? After all a prior still requires

somesort of model that you can use to expect things or not, and I have no reasonable models at all for such worlds. A mathematical “universal” prior like Solomonoff is useless since it’s theoretically uncomputable, and also in a more practical sense utterly disconnected from the domain of properties such as “America’s population”.On the whole though, your point is quite correct that for many priors you can’t “integrate the extreme tails” to get a significant effect. The tails of some priors are just too thin.

While you’re quite right about numbers on the scale of billions or trillions, I don’t think it makes sense

in the limitfor the prior probability of X people existing in the world to fall faster than X grows in size.Certain series of large numbers grow larger much faster than they grow in complexity. A program that returns 10^(10^(10^10)) takes fewer bits to specify (relative to most reasonable systems of specifying programs) than a program that returns 32758932523657923658936180532035892630581608956901628906849561908236520958326051861018956109328631298061259863298326379326013327851098368965026592086190862390125670192358031278018273063587236832763053870032004364702101004310417647840155719238569120561329853619283561298215693286953190539832693826325980569123856910536312892639082369382562039635910965389032698312569023865938615338298392306583192365981036198536932862390326919328369856390218365991836501590931685390659103658916392090356835906398269120625190856983206532903618936398561980569325698312650389253839527983752938579283589237325987329382571092301928* - even though 10^(10^(10^10)) is by far the larger number. And it only takes a linear increase in complexity to make it 10^(10^(10^(10^(10^(10^10))))) instead.

*I produced this number via keyboard-mashing; it’s not anything special.

Consider the proposition “A superpowered entity capable of creating unlimited numbers of people ran a program that output the result of a random program out of all possible programs (with their outputs rendered as integers), weighted by the complexity of those programs, and then created that many people.”

If this happened, the probability that their program outputs at least X would fall

muchslower than X rises, in the limit. The sum doesn’t converge at all; the expected number of people created would be literallyinfinite.So as long as you assign greater than

literally zero probabilityto that proposition—and there’s no such thing as zero probability—there must exist some number X such that you assign greater than 1/X probability to X people existing. In fact, there must exist some number X such that you assign greater than 1/X probability to Xmillionpeople existing, or X billion, or so on.(btw, I don’t think that the sort of SIA-based reasoning here is actually valid—but if it was, then yeah, it implies that there are infinite people.)

I think when you get to

anyclass of hypotheses like “capable of creating unlimited numbers of people” with nonzero probability, you run into multiple paradoxes of infinity.For example, there is no uniform distribution over any countable set, which includes the set of all halting programs. Every non-uniform distribution this hypothetical superbeing may have used over such programs is a

differentprior hypothesis. The set of these has no suitable uniform distribution either, since they can be partitioned into countably many equivalence classes under natural transformations.It doesn’t take much study of this before you’re digging into pathologies of measure theory such as Vitali sets and similar.

You can of course arbitrarily pick any of these weightings to be your “chosen” prior, but that’s just equivalent to choosing a prior over population directly so it doesn’t help at all.

Probability theory can’t adequately deal with such hypothesis families, and so if you’re considering Bayesian reasoning you must discard them from your prior distribution. Perhaps there is some extension or replacement for probability that can handle them, but we don’t have one.

I just came across this word from John Koenig’s

Dictionary of Obscure Sorrows, that nicely capture the thesis of All Debates Are Bravery Debates.I would really like to see a post from someone in AI policy on “Grading Possible Comprehensive AI Legislation.” The post would lay out what kind of safety stipulations would earn a bill an “A-” vs a “B+”, for example.

I’m imagining a situation where, in the next couple years, a big omnibus AI bill gets passed that contains some safety-relevant components. I don’t want to be left wondering “did the safety lobby get everything it asked for, or did it get shafted?” and trying to construct an answer ex-post.

File under ‘noticing the start of an exponential’: A.I. Helped to Find a Vast Source of the Copper That A.I. Needs to Thrive

Today I am thankful that Bayes’ Rule is unintuitive.

Much ink has been spilled complaining that Bayes’ Rule can yield surprising results. As anyone who has taken an introductory statistics class knows, it is difficult to solve a problem that requires an application of Bayes’ Rule without plugging values into the formula, at least for a beginner. Eventually, the student of Bayes may gain an intuition for the Rule (perhaps in odds form), but at that point they can be trusted to wield their intuition responsibly because it was won through disciplined practice.

This unintuitiveness is a feature, not a bug because it discourages motivated reasoning. If Bayes’ Rule were more intuitive, it would be simple to back out what P(A), P(B), and P(B|A) must be to justify your preferred posterior belief, and then argue for these quantities. It would also be simple to work backwards to select your prediction A from a favorable hypothesis space. Because Bayes’ Rule is unintuitive, these are challenging moves, and formally updating your beliefs is less vulnerable to motivated reasoning.

Happy Thanksgiving!