shortest goddamn bayes guide ever
// ODDS = YEP:NOPE
YEP, NOPE = MAKE UP SOME INITIAL ODDS WHO CARES
FOR EACH E IN EVIDENCE
YEP *= CHANCE OF E IF YEP
NOPE *= CHANCE OF E IF NOPE
ASSUME E // DO NOT DOUBLE COUNTThe thing to remember is that yeps and nopes never cross. The colon is a thick & rubbery barrier. Yep with yep and nope with nope.
bear : notbear =
1:100 odds to encounter a bear on a camping trip around here in general
* 20% a bear would then scratch my tent : 50% a notbear would
* 10% a bear would then flip my tent over : 1% a notbear would
* 95% a bear would then look exactly like a fucking bear inside my tent : 1% a notbear would
* 0.01% chance a bear would then eat me alive : 0.001% chance a notbear would
As you die you conclude 1*20*10*95*.01 : 100*50*1*1*.001 = 190 : 5 odds that a bear is eating you.
Kicks open the door
Alright, here’s the current state of affairs:
Premise: Bayes Theorem is held up as the most important part of rationality. ACX has the tagline “P(A|B) = [P(A)P(B|A)]/P(B), all the rest is commentary.” Yudkowsky mentions it approximately a million times in the Sequences.
Premise: The average rationalist cannot use Bayes theorem. No, I will be stronger and more specific: when I ask a room full of people at a LessWrong meetup whether they can write the formula for Bayes Theorem on a piece of paper, less than half of them can.
Conclusion: Less than half of the rationalist community understands the most important part of rationality.
Or in other words, we suck. Lest anyone think I’m merely throwing stones, I screwed up Bayes the first time I tried to use it in public. I would not bet a lot on me getting any particular problem right. I suck too.
This version though? This I think most people could remember. I can do this version in my head. I’ve read a half-dozen explainers for Bayes, some with very nice pictures. This beats all of them, and it’s in less than two hundred words! Maybe this is a case of Writing A Thousand Roads To Rome where this version happened to click with me but it’s fundamentally just as good as many other versions. I suspect this is a simpler formulation.
Either someone needs to point out where this math is wrong, or I’m just going to use this version for myself and for explaining it to others. A much simpler version of the only non-commentary part of rationality seems a worthy use of Best of LessWrong space to me.
By most people you mean most people hanging around the lesswrong community because they know programming? I agree, an explanation that uses language that the average programmer can understand seems like a good strategy of explaining Bayes rule given the rationality communities demographics (above average programmers).
Was it the code or the example that helped? The code is mostly fine. I don’t think it is any simpler than the explanations here, the notation just looks scarier.
This version is correct for naive bayes, but naive bayes is in fact naive and can lead you arbitrarily astray. If you wanted a non-naive version you would write something like this in pseudopython:
I see the case for starting with the naive version though, so this is more of a minor thing.
I don’t see a lot more going for the bear example except for it being about something dramatic, so more memorable. Feels like you should be able to do strictly better examples. See Zane’s objections in the other comment.
It’s not the programming notation that makes it work for me (though that helps a little.) It’s not the particular example either, though I do think it’s a bit better than the abstract mammogram example. There’s just way fewer numbers.
It’s because the notation on each line contains two numbers, both of which are. . . primitives? atomic pieces? I can do them in one step. (My inner monologue goes something like “3:2 means the the first thing happens three times for every two times the second thing happens, over the long run anyway. There’s five balls in the bag, three of the first colour and two of the second. Now more balls than that, but keep the ratio.”)
And then if I want to do an update, I just need four numbers, each of which makes sense on their own, each of which is used in one place. 1:100, 20:50, multiply the left by the left (20) and the right by the right (5000) and now I have two numbers again. (20:5000) I can usually simplify that in my head (2:500, okay now 1:250.) The line “The colon is a thick & rubbery barrier. Yep with yep and nope with nope” helps a lot, I’m reminded to keep all the yeps on the left and the nopes on the right. Because multiplication is transitive, I can just keep doing that at each new update, never dealing with more than four numbers. If I’d rather (or if I’m using pen and paper) I can just write out a dozen updates and get the products after.
Compare this sucker:
Four numbers used in six places. I’ll be the village idiot and admit I cannot reliably keep a phone number in my head without mental tricks. I have lost count of the number of times I have swapped P(A|B) and P(B|A) accidentally. The numbers aren’t arranged on the page in a way that helps my intuition, like yeps being on top and nopes on the bottom or something.
Or compare the explanation at the first link you shared.
O(H)×Le(H)=O(H∣e)O(H)× Le1(H)× Le2(H∧e1)× Le3(H∧e1∧e2)=O(H∣e1∧e2∧e3)I am trying to express I find that more complicated. I don’t know what Le means. It took me a bit to remember what O stands for. If you are ever trying to explain something to the general population and you need LaTeX to do it, stop what you are doing and come up with a new plan. Seven paragraphs into that page we get the odds form with the colon, and it’s for three different hypothesis; I’m aware you can write odds like 3:2:4 but that’s less common. Drunk people who flunked high school routinely calculate 3:2 in pubs! Start with the two hypothesis version, then maybe mention that you can do three hypotheses at once. “Shortest Goddamn Bayes Guide Ever” uses strictly symbols on a standard keyboard and math which is within the limits of an on-track fourth grader. It’s less than two hundred words! The thing would fit in three tweets!
I think that is a masterwork of pedagogy and editing, worthy of praise and prominent place.
If there’s a way to make this version work for non-naive updates that seems good, and my understanding is it’s mostly about saying for each new line “given that the above has happened, what are the odds of this observation?” instead of “what are the odds of this observation assuming I haven’t seen the above”? It’s not like the P(A|B) formulation prevents people from making that exact mistake. (Citation, I have made that exact mistake.)
Interesting! Makes sense.
Yes that’s it. Yeah I am not trying to defend the probability version of bayes rule. When I was trying to explain bayes rule to my wordcel gf, I was also using the odds ratio.
Yes, odds notation is the only sane way to do Bayes. Who cares about Bayes theorem written out in math. Just think about hypotheses and likelihoods. If you need to rederive the math notation start from thinking in odds and rederive what you would do to get a probability out of odds.
I do sure feel confused why so many people mess up Bayes. The core of bayesian reasoning is literally just asking the question of “what is the probability that I would see this evidence given each one of my hypotheses”, or in the case of a reasonable null hypothesis and a hypothesized conjecture the question of “would I be seeing anything different if I was wrong/if this was just noise?”.
To be clear, this is also what is in all the standard Bayes guides. Eliezer’s Bayes guide both on Yudkowsky.net and Arbital.com is centrally about the odds notation.
I’m confused and alarmed that there is apparently some very large group of people who consider themselves rationalists but do not understand bayes theorem. (Bayesian statistics as a whole is a lot more complicated, of course, but Bayes theorem is not the hard part.) it’s not a particularly complicated piece of math! the core idea can definitely trip you up if you’ve never ever heard of it, but it’s also not that deep and shouldn’t be hard to explain. and even if you don’t remember the exact formula, it should be very easy to rederive within a few minutes from first principles once you understand the core idea.
how did we end up in a world where a community that attracts people of above-average intelligence and education, that places a large emphasis on math and STEM, that worships a particular theorem for some reason and has produced dozen(s?) of explainers for that theorem, still end up having the median member not understand the theorem? I think the missing thing here is not the one true intuitive bayes guide that will once and for all explain things the right way.
I am if not alarmed then at least consider it a problem, but haven’t felt confusion here for at least a year. I have a pretty good model of how it happens. Someone’s doing some searching on the internet, and gets recommended a LessWrong article on an Boston rents, or an AI paper, or hikers going missing. Maybe a friend recommended them a fun essay on miracles or a goofy Harry Potter fanfic. They hang around, read a few more things, comment a bit. Then they see a meetup announcement, and show up, and enjoy conversation. (Very very roughly a third of LessWrong/ACX meetups are socials, with no or minimal readings or workshops.) They go to more meetups, they make more comments on the internet, maybe they make some posts of their own and their posts get upvoted. Maybe they step up and run the meetup when the previous organizer is sick or busy or moves.
At no point did someone give them a math test. I’m basically describing my arc above, and nobody asked me to solve a mammogram problem in that process.
That’s how we end up in this world.
As for what the missing thing is: my theory is to change this state of affairs, we’d need two things. We’d need to start actually regularly asking folks questions where they’d need to use it, and we’d need an explanation fast and simple enough that it can survive being taught by non-specialists who are also juggling having snacks out and getting the door for people. I love this not for its intuitiveness, but for rearranging the numbers to a shape people can do easier.
I’d give much higher odds on members of the community being able to gesture at the key ideas of base rates and priors in English sentences! (Not as high as I’d like, but higher, anyway.) But that’s not the same as being able to do the calculations. And there’s something slippery about describing a piece of math in intuitive sentences then trying to use it as a heuristic without quite being able to actually run the numbers, which is why I’d like to change that.
Again, I suck too! I’m running around doing a dozen things in my day to day life, none of which is remedial math practice. This kind of thing happens a lot actually. Once upon a time I did some basic interviews for some software developers, and watched comp sci grads fail to fizzbuzz correctly.
My hope is that if somehow I can get a tweet or two worth of text that teaches the numbers in a way that can fit in math people already do in their daily life (multiplication between two to four numbers) and add a small battery of exercises that use it, I might be able to package that in a way local organizers not only could use but would spread. Like you say, maybe hoping for just one more Bayes explanation is not the path. To me, this one was a meaningful step simpler and easier.
I guess I’ll note as well that I want to raise the sanity waterline. To do that, I can’t work with a version that wants above average intelligence. I do genuinely want to figure out how to teach Bayes to fourth graders and then go out and teach some fourth graders. C’mon, don’t you want to see what people turn out like if they have access to a better mental toolkit from a young age?
Also,
I think you might be having an xkcd feldspar moment.
I’m 13 and I consider myself to have a sufficient understanding of Bayes, or at least I’d be able to write it out and use it in basic situations. This community seems to be filled of pretty smart people, I haven’t been to enough events to make a sustained argument on this. I find this guide to be even more confusing than learning the formula itself, but maybe that’s just my perception.
Some cruxes for me:
Bayes is the most important part of rationality.
Maybe you think it’s something else? If so, what? (Not a rhetorical question- I have three or four candidates myself.)
Being able to do the math is important.
Maybe you think the verbal version is sufficient?
Maybe you think people don’t actually do the math much even when they can, so it doesn’t matter?
This formulation is a meaningful step forward.
Maybe you think the theorem version is fine and people can rederive the odds version?
Maybe you think mentioning the odds version partway through the other guides is fine?
Most rationalists can’t write out Bayes Theorem, and can’t correctly answer a mammogram problem.
Maybe you think they can? (I invite you to test this at your local LessWrong meetup and report back! If you want to make a bet about it, I can probably get a lot of LessWrong meetups to test this and report back!)
Maybe you object to “rationalist” being defined by ~self-identification, such that if someone can’t write it out or answer the mammogram problem they don’t count as a rationalist?
Here’s a visual description: Imagine all worlds, before you see evidence cut into two: YEP and NOPE. The ratio of how many are in each (aka probability mass or size) represents the prior odds. Now, you see some evidence E (e.g. a metal detector beeping), so we want to know the ratio after seeing it.
Each part of the prior cut produces worlds with E (e.g. produces beeps). A YEP produces (Chance of E if YEP) amount of E worlds while a NOPE produces (Chance of E with NOPE).
And thus the new ratio is the product.
In case you don’t know what odds are, they express a ratio using a pair of numbers where the overall scale is irrelevant, e.g. 1:2 and 2:4 represent the same ratio. Probabilities are the values when you scale so that the sum over all outcomes is 1, so in this case 1:2 = 1⁄3 : 2⁄3 so the probabilities are 1⁄3, 2⁄3.
In my opinion, the odds form is the superior form, because it’s very easy to use and remember and “philosophically speaking” relative probabilityness is possibly more fundamental. Even at higher levels it’s often more practical. I see it as a pedagogical mistake that Bayes theorem is usually first explained in probability form—even on this site! Basic things should be deeply understood by ~everyone.
0⁄8
This seems like a low quality comment. First because there’s no sentence or reasoning. Second because you say 0⁄8, and the voting is out of 9, which leaves me a little confused and wonder whether you mean something else entirely.