Bayes is Out-Dated, and You’re Doing it Wrong

<sharing it here, too, though I can already imagine the reaction...>

~ a community of Bayes-enthusiasts fumble statistical inference ~

TL;DR — Industry uses Dirichlet Process and SAS, NOT Bayes. Bayes is persistently *wrong* and lacks a great deal of important information. Supposed ‘rationalists’ cling to Bayes as the Ultimate Truth, without knowing enough Mathematics to know they’re wrong.

Oh, well my Prior was <preferred assumption> but I guess I have to update with that one data-point that wandered into my life.” — multiple ‘Rationalists’ in my year of invading their gatherings

A weird thing is happening in the Bay Area, slowly creeping into the Zeitgeist: a group of non-mathematicians have decided they found the BEST statistical technique ever, and they want to use it to understand the whole world… but their technique is 260 YEARS OLD, and we’ve done a LOT better since then. It’s called Bayes’ Theorem, published in 1763 — literally 260 candles this year.

Let’s get a sense of just how out-dated and bizarre it is, to insist you have the One-True-Method when it’s 260 years old: back in 1763, when Bayes was published, there was another new-fangled invention sweeping Europe — the Dutch Plough. That’s the plough used today by the Amish. Literally, relying on Bayes to draw conclusions is like farming with an Amish plough; it’s hilariously inadequate, and completely dismissed by industry.

That quote at the top is an amalgam of multiple conversations with the Effective Altruists and Astral Codex Ten ‘Rationalists’ (they made that term up to describe themselves); it’s a persistent theme in their conversations. And, it’s not even the *correct* use of Bayes! Let’s see why:

In Bayes’ Theorem, you begin with a Prior. These Rationalists pick the Prior that they *prefer*. Neutral Bayesian Priors, however, are the average of all possible assumptions, NOT you’re preferred place to start. These folks’ first step is a disastrous error. Then, when they say “I guess I should update my Prior…” Wait! Why in the world would you ever feel confidence about a belief, when the ONLY thing you have is a Prior? A Prior is, by definition, the state of “no information” when one should have intellectual humility, not certainty!

Then, they are updating their Bayesian estimate using…. a *few* examples? The Rationalists repeatedly rely upon sparse evidence, while claiming certainty, as if “Statistically Significant Sample Size” just isn’t a thing. Bayes doesn’t *need* statistically significance, apparently! Finally, those examples they use are culled from personal experience. I hope I don’t have to explain to anyone why we need to collect a random sample from representative sub-populations? The supposedly rational Bayes-fans fail on each possible count.

So, if they correct those mistakes, can they then rely on Bayes to find their precious truths? Nope. Bayes is consistently wrong, reliably. That’s why industry doesn’t use it. They’d lose money. Dirichlet lets them make money, because it works better. That’s a stronger proof, empirically, than all the rationalizations of their community’s prominent Bayes-trumpeters: a fiction writer and a psych councilor, both of whom lack relevant experience with statistical analysis software and techniques.

In particular, the blog of that psych councilor, “Astral Codex Ten” has a tag-line: it quotes Bayes’ Theorem, and follows by saying “all else is commentary.” Everyone who reads his blog, and who then DOESN’T check what statistical techniques are used in the real world, stays there as part of the community. They have self-selected for a community of people who call Bayes the be-all-end-all, all of them agreeing they’re right, and they don’t know that they’re horribly wrong… because they don’t check!

Think about this for a moment: if you state Bayes’ Theorem, and then claim “all else is commentary” while recommending readers use Bayes, you are implicitly claiming “NO further improvements in statistical analysis have occurred in the 260 years since Bayes was published; Student-t Distributions, Levi Distributions, they don’t even need to exist!” That’s the core tenet of the Bay Area Rationalists’ luminary, addicted to Bayes.

Wait, so why and how is Dirichlet such an improvement?

Let’s imagine you took a survey in some big city, and found (unsurprisingly) a majority Democrats — it was a 6040 split, on the nose. That sample’s split is also the “maximum likelihood” for the potential Population. Said another way, “The real-world population which is most likely to give you a 6040 sample is a 6040 population.” But, does that make 6040 your best guess for the real population? No.

Imagine each possible population, one at a time. There’s the 100% Democrat population, first — what is the *likelihood* of such a population producing a 6040 sample? Zero. What about 99% Democrat? Well, then it’ll depend upon how *many* people you surveyed, but there is just a tiny chance the real population is 99% Democrat! Keep doing that, for every population, all the way to 99% Republican, then 100% Republican. Whew! Now, you have a *likelihood* distribution, the “likelihood of population X generating sample Y.”

When we look at this distribution, for data that falls in two buckets (D/​R), then we’ll notice something: the *peak* likelihood is at 6040, but there’s ALSO a bunch of probability-mass on the 5050 side of the curve, creating a tilt to the over-all probability. While the ‘mode’ of the likelihood distribution is still the 6040 estimate, the actual ‘mean’ of that distribution is closer to 5050, every time! You *should* expect that the true population is closer to an *equal division* among buckets. When you collect more samples, you narrow that distribution of likelihoods, so you see less drift toward 5050. That’s the reason you want a ‘statistically significant sample size’.

Let’s look at that other aspect Dirichlet possesses, which Bayes wholly lacks: Confidence!

When you look at the likelihood of each population, the chance of it producing your observed sample, you can also ask: “How far AWAY from our best guess would we need to place boundaries, such that we include 95% of the possible populations’ likelihoods within our bounds?” That’s called your Confidence Interval! You may have only learned the trimmed-down simplicities and z-score tables in your Stat 101 class, but there’s a reason for why they can claim confidence: that interval of population-estimates contains 95% of the likelihood-distribution’s probability-mass!

Finally, let’s consider “the cost of being wrong”. Bayes doesn’t balance your prediction according to the cost of being wrong; Dirichlet’s distribution over potential populations can simply be *multiplied* by the cost of each error-distance, and then the mode of that distribution will “minimize the COST of being WRONG.” You can even multiply by costs which are discontinuous or ranges, producing high and low bounds and nuanced thresholds of risk. Definitely better than Bayes.

Now, Dirichlet isn’t even the be-all-end-all… it was published in 1973, 50 years old THIS year! SAS has trade secrets since the 70’s, and invests 2.5x more into R&D than the TECH-industry average! If you want to pass muster for pharmaceuticals in front of the FDA, you send all your data to SAS. It’s required, because they’re soooo damn GOOD! So, unless you work at SAS (which has the highest profits per employee hour of all companies on Earth, and has expanded consistently since 1976… consistently rated one of the best employers on the planet…) then you DON’T know the be-all-end-all statistical technique — and neither do Scott Alexander or Eliezer Yudkowski, as much as they’d like you to believe otherwise. Just for reference, when “you think you’re right BECAUSE you don’t know enough to know you’re wrong,” that’s called the Dunning-Kreuger Effect, dear Rationalists.