# 0 And 1 Are Not Probabilities

One, two, and three are all integers, and so is negative four. If you keep counting up, or keep counting down, you’re bound to encounter a whole lot more integers. You will not, however, encounter anything called “positive infinity” or “negative infinity,” so these are not integers.

Positive and negative infinity are not integers, but rather special symbols for talking about the behavior of integers. People sometimes say something like, “5 + infinity = infinity,” because if you start at 5 and keep counting up without ever stopping, you’ll get higher and higher numbers without limit. But it doesn’t follow from this that “infinity—infinity = 5.” You can’t count up from 0 without ever stopping, and then count down without ever stopping, and then find yourself at 5 when you’re done.

From this we can see that infinity is not only not-an-integer, it doesn’t even *behave* like an integer. If you unwisely try to mix up infinities with integers, you’ll need all sorts of special new inconsistent-seeming behaviors which you don’t need for 1, 2, 3 and other *actual* integers.

Even though infinity isn’t an integer, you don’t have to worry about being left at a loss for numbers. Although people have seen five sheep, millions of grains of sand, and septillions of atoms, no one has ever counted an infinity of anything. The same with continuous quantities—people have measured dust specks a millimeter across, animals a meter across, cities kilometers across, and galaxies thousands of lightyears across, but no one has ever measured anything an infinity across. In the real world, you don’t *need* a whole lot of infinity.^{1}

In the usual way of writing probabilities, probabilities are between 0 and 1. A coin might have a probability of 0.5 of coming up tails, or the weatherman might assign probability 0.9 to rain tomorrow.

This isn’t the only way of writing probabilities, though. For example, you can transform probabilities into odds via the transformation O = (P/(1 - P)). So a probability of 50% would go to odds of 0.5/0.5 or 1, usually written 1:1, while a probability of 0.9 would go to odds of 0.9/0.1 or 9, usually written 9:1. To take odds back to probabilities you use P = (O∕(1 + O)), and this is perfectly reversible, so the transformation is an isomorphism—a two-way reversible mapping. Thus, probabilities and odds are isomorphic, and you can use one or the other according to convenience.

For example, it’s more convenient to use odds when you’re doing Bayesian updates. Let’s say that I roll a six-sided die: If any face except 1 comes up, there’s a 10% chance of hearing a bell, but if the face 1 comes up, there’s a 20% chance of hearing the bell. Now I roll the die, and hear a bell. What are the *odds* that the face showing is 1? Well, the prior odds are 1:5 (corresponding to the real number ^{1}⁄_{5} = 0.20) and the likelihood ratio is 0.2:0.1 (corresponding to the real number 2) and I can just multiply these two together to get the posterior odds 2:5 (corresponding to the real number ^{2}⁄_{5} or 0.40). Then I convert back into a probability, if I like, and get (0.4/1.4) = ^{2}⁄_{7} = ~29%.

So odds are more manageable for Bayesian updates—if you use probabilities, you’ve got to deploy Bayes’s Theorem in its complicated version. But probabilities are more convenient for answering questions like “If I roll a six-sided die, what’s the chance of seeing a number from 1 to 4?” You can add up the probabilities of ^{1}⁄_{6} for each side and get ^{4}⁄_{6}, but you can’t add up the odds ratios of 0.2 for each side and get an odds ratio of 0.8.

Why am I saying all this? To show that “odd ratios” are just as legitimate a way of mapping uncertainties onto real numbers as “probabilities.” Odds ratios are more convenient for some operations, probabilities are more convenient for others. A famous proof called Cox’s Theorem (plus various extensions and refinements thereof) shows that all ways of representing uncertainties that obey some reasonable-sounding constraints, end up isomorphic to each other.

Why does it matter that odds ratios are just as legitimate as probabilities? Probabilities as ordinarily written are between 0 and 1, and both 0 and 1 look like they ought to be readily reachable quantities—it’s easy to see 1 zebra or 0 unicorns. But when you transform probabilities onto odds ratios, 0 goes to 0, but 1 goes to positive infinity. Now absolute truth doesn’t look like it should be so easy to reach.

A representation that makes it even simpler to do Bayesian updates is the log odds—this is how E. T. Jaynes recommended thinking about probabilities. For example, let’s say that the prior probability of a proposition is 0.0001—this corresponds to a log odds of around −40 decibels. Then you see evidence that seems 100 times more likely if the proposition is true than if it is false. This is 20 decibels of evidence. So the posterior odds are around −40 dB + 20 dB = −20 dB, that is, the posterior probability is ~0.01.

When you transform probabilities to log odds, 0 goes to negative infinity and 1 goes to positive infinity. Now both infinite certainty and infinite improbability seem a bit more out-of-reach.

In probabilities, 0.9999 and 0.99999 seem to be only 0.00009 apart, so that 0.502 is much further away from 0.503 than 0.9999 is from 0.99999. To get to probability 1 from probability 0.99999, it seems like you should need to travel a distance of merely 0.00001.

But when you transform to odds ratios, 0.502 and 0.503 go to 1.008 and 1.012, and 0.9999 and 0.99999 go to 9,999 and 99,999. And when you transform to log odds, 0.502 and 0.503 go to 0.03 decibels and 0.05 decibels, but 0.9999 and 0.99999 go to 40 decibels and 50 decibels.

When you work in log odds, **the distance between any two degrees of uncertainty equals the amount of evidence you would need to go from one to the other**. That is, the log odds gives us a natural measure of spacing among degrees of confidence.

Using the log odds exposes the fact that reaching infinite certainty requires infinitely strong evidence, just as infinite absurdity requires infinitely strong counterevidence.

Furthermore, all sorts of standard theorems in probability have special cases if you try to plug 1s or 0s into them—like what happens if you try to do a Bayesian update on an observation to which you assigned probability 0.

So I propose that it makes sense to say that 1 and 0 are not in the probabilities; just as negative and positive infinity, which do not obey the field axioms, are not in the real numbers.

The main reason this would upset probability theorists is that we would need to rederive theorems previously obtained by assuming that we can marginalize over a joint probability by adding up all the pieces and having them sum to 1.

However, in the real world, when you roll a die, it doesn’t literally have infinite certainty of coming up some number between 1 and 6. The die might land on its edge; or get struck by a meteor; or the Dark Lords of the Matrix might reach in and write “37” on one side.

If you made a magical symbol to stand for “all possibilities I haven’t considered,” then you could marginalize over the events including this magical symbol, and arrive at a magical symbol “T” that stands for infinite certainty.

But I would rather ask whether there’s some way to derive a theorem without using magic symbols with special behaviors. That would be more elegant. Just as there are mathematicians who refuse to believe in the law of the excluded middle or infinite sets, I would like to be a probability theorist who doesn’t believe in absolute certainty.

^{1}I should note for the more sophisticated reader that they do not need to write me with elaborate explanations of, say, the difference between ordinal numbers and cardinal numbers. I’m familiar with the different set-theoretic notions of infinity, but I don’t see a good use for them in probability theory.

- Counter-theses on Sleep by 21 Mar 2022 23:21 UTC; 386 points) (
- Punctuality—Arriving on Time and Math by 3 May 2012 1:35 UTC; 132 points) (
- Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think by 27 Dec 2019 5:09 UTC; 102 points) (
- Building Phenomenological Bridges by 23 Dec 2013 19:57 UTC; 93 points) (
- What are the open problems in Human Rationality? by 13 Jan 2019 4:46 UTC; 93 points) (
- Bridge Collapse: Reductionism as Engineering Problem by 18 Feb 2014 22:03 UTC; 77 points) (
- Why Rationalists Shouldn’t be Interested in Topos Theory by 25 May 2020 5:35 UTC; 71 points) (
- Bayesian Mindset by 21 Dec 2021 19:54 UTC; 70 points) (EA Forum;
- Unnatural Categories Are Optimized for Deception by 8 Jan 2021 20:54 UTC; 67 points) (
- Information theory and the symmetry of updating beliefs by 20 Mar 2010 0:34 UTC; 63 points) (
- Reality is weirdly normal by 25 Aug 2013 19:29 UTC; 55 points) (
- Tagging Progress at 100%! (Party & Celebratory Talk w/ Jason Crawford, Habryka on Sun, Aug 30th, 12pm PDT) by 22 Aug 2020 16:45 UTC; 54 points) (
- Understanding Machine Learning (I) by 20 Dec 2019 18:22 UTC; 44 points) (
- The Upper Limit of Value by 27 Jan 2021 14:13 UTC; 40 points) (
- Decoherence is Falsifiable and Testable by 7 May 2008 7:54 UTC; 40 points) (
- 3 Jun 2021 14:03 UTC; 39 points) 's comment on “Existential risk from AI” survey results by (EA Forum;
- Learning and manipulating learning by 19 May 2020 13:02 UTC; 39 points) (
- How might better collective decision-making backfire? by 13 Dec 2020 11:44 UTC; 37 points) (EA Forum;
- Awful Austrians by 12 Apr 2009 6:06 UTC; 37 points) (
- 6 Feb 2013 17:42 UTC; 34 points) 's comment on How to offend a rationalist (who hasn’t thought about it yet): a life lesson by (
- Crypto quant trading: Naive Bayes by 7 May 2019 19:29 UTC; 33 points) (
- Science like a chef by 8 Feb 2018 21:23 UTC; 32 points) (
- Log-odds (or logits) by 28 Nov 2011 1:11 UTC; 31 points) (
- 16 Nov 2013 10:18 UTC; 27 points) 's comment on The dangers of zero and one by (
- Timelessness as a Conservative Extension of Causal Decision Theory by 28 May 2014 14:57 UTC; 25 points) (
- What is the right phrase for “theoretical evidence”? by 1 Nov 2020 20:43 UTC; 24 points) (
- 12 Jul 2020 1:59 UTC; 22 points) 's comment on Kelly Bet on Everything by (
- Words Aren’t Type Safe by 19 Jun 2019 20:34 UTC; 21 points) (
- 7 Dec 2009 16:46 UTC; 21 points) 's comment on Parapsychology: the control group for science by (
- 11 Sep 2015 8:22 UTC; 20 points) 's comment on Why Don’t Rationalists Win? by (
- 3 Jun 2010 3:40 UTC; 20 points) 's comment on Rationality quotes: June 2010 by (
- 22 Nov 2018 22:10 UTC; 19 points) 's comment on Jesus Made Me Rational (An Introduction) by (
- Unknown unknowns by 5 Aug 2011 12:55 UTC; 19 points) (
- 11 Nov 2012 5:18 UTC; 18 points) 's comment on Struck with a belief in Alien presence by (
- 15 Feb 2012 10:21 UTC; 18 points) 's comment on Hearsay, Double Hearsay, and Bayesian Updates by (
- 8 Jul 2012 13:35 UTC; 17 points) 's comment on Stupid Questions Open Thread Round 3 by (
- 1 Sep 2010 21:26 UTC; 16 points) 's comment on Newcomb’s Problem: A problem for Causal Decision Theories by (
- 7 Nov 2013 22:27 UTC; 14 points) 's comment on Yes, Virginia, You Can Be 99.99% (Or More!) Certain That 53 Is Prime by (
- 7 Jan 2015 12:14 UTC; 14 points) 's comment on Rationality Quotes January 2015 by (
- 9 May 2013 15:52 UTC; 14 points) 's comment on Open Thread, May 1-14, 2013 by (
- 15 Nov 2011 19:40 UTC; 13 points) 's comment on Science as Attire by (
- 3 Apr 2022 3:41 UTC; 13 points) 's comment on MIRI announces new “Death With Dignity” strategy by (
- 20 Sep 2008 23:14 UTC; 13 points) 's comment on How Many LHC Failures Is Too Many? by (
- 7 Jun 2011 15:50 UTC; 12 points) 's comment on St. Petersburg Mugging Implies You Have Bounded Utility by (
- Can we always assign, and make sense of, subjective probabilities? by 17 Jan 2020 3:05 UTC; 11 points) (
- 20 Feb 2020 19:24 UTC; 11 points) 's comment on How do you survive in the humanities? by (
- Rationality Reading Group: Part E: Overly Convenient Excuses by 16 Jul 2015 3:38 UTC; 11 points) (
- 23 May 2010 11:20 UTC; 10 points) 's comment on Open Thread: May 2010, Part 2 by (
- A digitized belief network? by 25 May 2012 1:27 UTC; 9 points) (
- [SEQ RERUN] 0 And 1 Are Not Probabilities by 20 Dec 2011 7:39 UTC; 9 points) (
- 11 Dec 2010 18:39 UTC; 9 points) 's comment on Best career models for doing research? by (
- P: 0 ⇐ P ⇐ 1 by 27 Aug 2017 21:57 UTC; 9 points) (
- 20 Jan 2008 5:55 UTC; 9 points) 's comment on Zut Allais! by (
- Quantifying the Far Future Effects of Interventions by 18 May 2016 2:15 UTC; 8 points) (EA Forum;
- 22 Feb 2011 23:58 UTC; 8 points) 's comment on Knowing About Biases Can Hurt People by (
- 2 Jul 2012 15:49 UTC; 8 points) 's comment on Can anyone explain to me why CDT two-boxes? by (
- 21 Feb 2018 5:52 UTC; 8 points) 's comment on How to not talk about probability estimates by (
- 20 Mar 2010 22:26 UTC; 7 points) 's comment on Open Thread: March 2010, part 3 by (
- 10 May 2013 17:39 UTC; 7 points) 's comment on Open Thread, May 1-14, 2013 by (
- 12 Dec 2011 18:36 UTC; 7 points) 's comment on two puzzles on rationality of defeat by (
- 17 Feb 2010 4:05 UTC; 6 points) 's comment on Update Yourself Incrementally by (
- 31 Mar 2016 9:24 UTC; 6 points) 's comment on “3 Reasons It’s Irrational to Demand ‘Rationalism’ in Social Justice Activism” by (
- 26 Oct 2011 20:20 UTC; 6 points) 's comment on Amanda Knox: post mortem by (
- 29 May 2012 14:58 UTC; 6 points) 's comment on Welcome to Less Wrong! (2012) by (
- 30 Oct 2012 1:09 UTC; 6 points) 's comment on Proofs, Implications, and Models by (
- 29 Apr 2009 1:55 UTC; 6 points) 's comment on Epistemic vs. Instrumental Rationality: Approximations by (
- Is skilled hunting unethical? by 17 Feb 2018 18:48 UTC; 6 points) (
- 18 Feb 2010 14:11 UTC; 6 points) 's comment on You’re Entitled to Arguments, But Not (That Particular) Proof by (
- 27 Apr 2009 23:48 UTC; 6 points) 's comment on Bayesian Cabaret by (
- 15 Sep 2011 0:16 UTC; 5 points) 's comment on How to Convince Me That 2 + 2 = 3 by (
- 30 Mar 2011 16:22 UTC; 5 points) 's comment on “Is there a God” for noobs by (
- 20 Jul 2010 1:14 UTC; 5 points) 's comment on Let them eat cake: Interpersonal Problems vs Tasks by (
- 21 Feb 2010 23:55 UTC; 5 points) 's comment on The Amanda Knox Test: How an Hour on the Internet Beats a Year in the Courtroom by (
- 16 Feb 2020 22:21 UTC; 5 points) 's comment on On characterizing heavy-tailedness by (
- STRUCTURE: Reality and rational best practice by 1 Feb 2019 23:51 UTC; 5 points) (
- 26 Jan 2012 18:47 UTC; 5 points) 's comment on Raising the Sanity Waterline by (
- 2 May 2012 19:41 UTC; 5 points) 's comment on The Wonder of Evolution by (
- 12 Aug 2020 20:55 UTC; 5 points) 's comment on Many-worlds versus discrete knowledge by (
- 10 Aug 2015 17:30 UTC; 5 points) 's comment on Open thread, Aug. 10 - Aug. 16, 2015 by (
- 1 Jan 2011 18:09 UTC; 5 points) 's comment on You Be the Jury: Survey on a Current Event by (
- 22 Sep 2010 3:39 UTC; 5 points) 's comment on Open Thread, September, 2010-- part 2 by (
- 16 Jan 2015 11:21 UTC; 4 points) 's comment on Je suis Charlie by (
- 10 Oct 2012 19:25 UTC; 4 points) 's comment on Welcome to Less Wrong! by (
- 10 Oct 2013 16:07 UTC; 4 points) 's comment on A Voting Puzzle, Some Political Science, and a Nerd Failure Mode by (
- 3 Feb 2014 5:44 UTC; 4 points) 's comment on The Amanda Knox Test: How an Hour on the Internet Beats a Year in the Courtroom by (
- 22 May 2011 12:55 UTC; 4 points) 's comment on Metacontrarian Metaethics by (
- 12 Nov 2011 2:08 UTC; 4 points) 's comment on Maximizing Cost-effectiveness via Critical Inquiry by (
- 31 Jul 2017 23:14 UTC; 4 points) 's comment on What Are The Chances of Actually Achieving FAI? by (
- 27 Jul 2016 23:32 UTC; 4 points) 's comment on Open thread, Jul. 25 - Jul. 31, 2016 by (
- 29 Apr 2010 17:55 UTC; 4 points) 's comment on Navigating disagreement: How to keep your eye on the evidence by (
- 17 Aug 2010 15:36 UTC; 4 points) 's comment on Desirable Dispositions and Rational Actions by (
- 11 Dec 2010 18:31 UTC; 4 points) 's comment on Best career models for doing research? by (
- 6 Apr 2022 16:07 UTC; 4 points) 's comment on Don’t die with dignity; instead play to your outs by (
- 23 Jul 2020 14:55 UTC; 4 points) 's comment on How good is humanity at coordination? by (
- 11 Jan 2011 3:45 UTC; 3 points) 's comment on Is there anything after death? by (
- 15 May 2011 21:03 UTC; 3 points) 's comment on Ethics and rationality of suicide by (
- 22 Jan 2012 14:26 UTC; 3 points) 's comment on Undiscriminating Skepticism by (
- 30 Oct 2013 23:14 UTC; 3 points) 's comment on Open Thread, October 27 − 31, 2013 by (
- 31 Aug 2013 19:22 UTC; 3 points) 's comment on Open thread, August 26 - September 1, 2013 by (
- 23 Aug 2010 23:15 UTC; 3 points) 's comment on Taking Ideas Seriously by (
- 13 Jan 2012 5:27 UTC; 3 points) 's comment on Knowledge vs Technology by (
- 5 Jul 2012 16:15 UTC; 3 points) 's comment on Irrationality Game II by (
- 29 Dec 2009 18:56 UTC; 3 points) 's comment on Scaling Evidence and Faith by (
- 2 Oct 2021 17:47 UTC; 3 points) 's comment on Why We Should Always Distrust Our Certainties by (
- 11 Nov 2021 0:32 UTC; 3 points) 's comment on Transcript for Geoff Anders and Anna Salamon’s Oct. 23 conversation by (
- 20 Jan 2008 3:59 UTC; 3 points) 's comment on Zut Allais! by (
- 14 Apr 2012 0:04 UTC; 2 points) 's comment on Configurations and Amplitude by (
- 16 Jun 2013 11:19 UTC; 2 points) 's comment on Normative uncertainty in Newcomb’s problem by (
- 15 Mar 2013 6:20 UTC; 2 points) 's comment on The Fundamental Question—Rationality computer game design by (
- 21 Oct 2014 15:50 UTC; 2 points) 's comment on Is the potential astronomical waste in our universe too small to care about? by (
- 6 Aug 2011 10:19 UTC; 2 points) 's comment on Attempt to explain Bayes without much maths, please review by (
- 19 May 2020 20:35 UTC; 2 points) 's comment on Learning and manipulating learning by (
- 15 May 2020 8:23 UTC; 2 points) 's comment on Utility need not be bounded by (
- 9 May 2013 7:39 UTC; 2 points) 's comment on Planets in the habitable zone, the Drake Equation, and the Great Filter by (
- 4 Apr 2011 20:02 UTC; 2 points) 's comment on Recent de-convert saturated by religious community; advice? by (
- 15 Nov 2009 14:38 UTC; 2 points) 's comment on Less Wrong Q&A with Eliezer Yudkowsky: Ask Your Questions by (
- 17 Aug 2010 18:40 UTC; 2 points) 's comment on Desirable Dispositions and Rational Actions by (
- 16 Nov 2017 23:58 UTC; 2 points) 's comment on Less Wrong Lacks Representatives and Paths Forward by (
- 23 Aug 2010 16:40 UTC; 2 points) 's comment on The Smoking Lesion: A problem for evidential decision theory by (
- 29 Dec 2013 21:26 UTC; 2 points) 's comment on Doubt, Science, and Magical Creatures—a Child’s Perspective by (
- 27 May 2012 15:05 UTC; 2 points) 's comment on (Almost) every moral theory can be represented by a utility function by (
- 20 Nov 2010 16:39 UTC; 2 points) 's comment on Existential Risk and Public Relations by (
- 11 Jul 2012 7:10 UTC; 1 point) 's comment on How Bayes’ theorem is consistent with Solomonoff induction by (
- 13 Apr 2009 1:44 UTC; 1 point) 's comment on Awful Austrians by (
- 7 Sep 2021 23:20 UTC; 1 point) 's comment on Confidence levels inside and outside an argument by (
- 11 May 2012 8:30 UTC; 1 point) 's comment on Strong intutions. Weak arguments. What to do? by (
- 29 May 2013 0:30 UTC; 1 point) 's comment on Requesting advice: Doing Epistemology Right (Warning: Abstract mainstream Philosophy herein) by (
- 15 Jun 2013 17:52 UTC; 1 point) 's comment on How should Eliezer and Nick’s extra $20 be split by (
- 10 Apr 2010 5:11 UTC; 1 point) 's comment on Swimming in Reasons by (
- In plain English—in what ways are Bayes’ Rule and Popperian falsificationism conflicting epistemologies? by 2 Apr 2021 21:21 UTC; 1 point) (
- 7 Aug 2013 21:49 UTC; 1 point) 's comment on Why I’m Skeptical About Unproven Causes (And You Should Be Too) by (
- 29 Oct 2012 15:27 UTC; 1 point) 's comment on Proofs, Implications, and Models by (
- 30 Sep 2013 21:16 UTC; 1 point) 's comment on Meetup : West LA Meetup: What are the odds? by (
- 14 Dec 2015 3:24 UTC; 1 point) 's comment on Maximizing Donations to Effective Charities by (
- 6 Feb 2014 11:18 UTC; 1 point) 's comment on I love zebras by (
- 22 Jan 2010 22:45 UTC; 1 point) 's comment on Costs to (potentially) eternal life by (
- Meetup : Saint Petersburg. Why rationality? by 3 Nov 2013 13:30 UTC; 1 point) (
- 2 Oct 2016 22:44 UTC; 1 point) 's comment on Conservation of Expected Evidence by (
- 5 May 2009 20:36 UTC; 1 point) 's comment on Bead Jar Guesses by (
- 6 Apr 2011 21:09 UTC; 1 point) 's comment on “How to Have a Rational Discussion” by (
- 2 Dec 2010 18:45 UTC; 0 points) 's comment on Unsolved Problems in Philosophy Part 1: The Liar’s Paradox by (
- 22 Apr 2009 17:21 UTC; 0 points) 's comment on The True Epistemic Prisoner’s Dilemma by (
- 1 Mar 2010 20:55 UTC; 0 points) 's comment on Open Thread: March 2010 by (
- 28 Nov 2011 2:01 UTC; 0 points) 's comment on [SEQ RERUN] Guardians of Ayn Rand by (
- 1 Feb 2010 22:47 UTC; 0 points) 's comment on Open Thread: February 2010 by (
- 17 May 2012 21:28 UTC; 0 points) 's comment on Thoughts on the Singularity Institute (SI) by (
- 5 Apr 2011 12:36 UTC; 0 points) 's comment on Newcomb’s Problem and Regret of Rationality by (
- 16 Jul 2012 19:33 UTC; 0 points) 's comment on Newcomb’s Problem and Regret of Rationality by (
- 19 Oct 2012 1:51 UTC; 0 points) 's comment on Mysterious Answers to Mysterious Questions by (
- 24 Jan 2013 13:06 UTC; 0 points) 's comment on Right for the Wrong Reasons by (
- 16 Feb 2010 0:41 UTC; 0 points) 's comment on Two probabilities by (
- 10 Feb 2013 19:26 UTC; 0 points) 's comment on How to offend a rationalist (who hasn’t thought about it yet): a life lesson by (
- 21 Feb 2010 20:22 UTC; 0 points) 's comment on The Amanda Knox Test: How an Hour on the Internet Beats a Year in the Courtroom by (
- 13 Feb 2014 7:18 UTC; 0 points) 's comment on Terminal and Instrumental Beliefs by (
- 16 Dec 2012 16:29 UTC; 0 points) 's comment on Ends Don’t Justify Means (Among Humans) by (
- 6 Jul 2010 6:15 UTC; 0 points) 's comment on Open Thread June 2010, Part 4 by (
- 6 Oct 2015 13:00 UTC; 0 points) 's comment on Probabilities Small Enough To Ignore: An attack on Pascal’s Mugging by (
- 17 Jun 2013 23:50 UTC; 0 points) 's comment on How should Eliezer and Nick’s extra $20 be split by (
- 25 Oct 2011 18:17 UTC; 0 points) 's comment on Practicing what you preach by (
- 19 Apr 2008 23:56 UTC; 0 points) 's comment on Identity Isn’t In Specific Atoms by (
- 7 Apr 2015 18:11 UTC; 0 points) 's comment on Feeling Rational by (
- 18 Jun 2018 18:06 UTC; 0 points) 's comment on Forked Russian Roulette and Anticipation of Survival by (
- 13 Mar 2015 17:59 UTC; 0 points) 's comment on Calibration Test with database of 150,000+ questions by (
- 22 Feb 2010 22:24 UTC; 0 points) 's comment on Open Thread: February 2010, part 2 by (
- 20 Apr 2017 0:34 UTC; 0 points) 's comment on Cheating Omega by (
- 22 May 2013 1:50 UTC; 0 points) 's comment on The flawed Turing test: language, understanding, and partial p-zombies by (
- 6 May 2011 18:30 UTC; 0 points) 's comment on Probability is in the Mind by (
- 12 Mar 2008 20:32 UTC; 0 points) 's comment on Probability is in the Mind by (
- 7 Jan 2016 1:01 UTC; 0 points) 's comment on The Number Choosing Game: Against the existence of perfect theoretical rationality by (
- 13 May 2013 17:39 UTC; 0 points) 's comment on Open Thread, May 1-14, 2013 by (
- 23 Jul 2015 8:44 UTC; 0 points) 's comment on Open Thread, Jul. 20 - Jul. 26, 2015 by (
- 26 Apr 2016 3:44 UTC; 0 points) 's comment on The Validity of the Anthropic Principle by (
- 18 Jan 2012 21:04 UTC; 0 points) 's comment on The problem with too many rational memes by (
- 26 Apr 2008 19:55 UTC; 0 points) 's comment on Where Experience Confuses Physicists by (
- 19 Jul 2009 20:27 UTC; 0 points) 's comment on Are You Anosognosic? by (
- 28 May 2012 16:37 UTC; -1 points) 's comment on Problematic Problems for TDT by (
- 26 Jan 2010 21:47 UTC; -1 points) 's comment on You cannot be mistaken about (not) wanting to wirehead by (
- 31 Jan 2013 13:33 UTC; -1 points) 's comment on My Wild and Reckless Youth by (
- 27 Jul 2010 20:00 UTC; -1 points) 's comment on Metaphilosophical Mysteries by (
- 25 Aug 2010 17:29 UTC; -1 points) 's comment on Taking Ideas Seriously by (
- 10 May 2012 13:54 UTC; -1 points) 's comment on The ethics of breaking belief by (
- 15 Mar 2013 11:30 UTC; -1 points) 's comment on Thoughts On The Relationship Between Life and Intelligence by (
- 11 Dec 2010 17:31 UTC; -1 points) 's comment on Best career models for doing research? by (
- A Conversation with GoD by 23 Aug 2016 7:59 UTC; -2 points) (
- 14 Dec 2012 8:47 UTC; -2 points) 's comment on Firewalling the Optimal from the Rational by (
- 16 Nov 2011 0:14 UTC; -2 points) 's comment on The “Intuitions” Behind “Utilitarianism” by (
- 28 Nov 2011 1:49 UTC; -4 points) 's comment on Log-odds (or logits) by (
- 6 Aug 2011 22:29 UTC; -4 points) 's comment on On the unpopularity of cryonics: life sucks, but at least then you die by (
- From Capuchins to AI’s, Setting an Agenda for the Study of Cultural Cooperation (Part2) by 28 Jun 2013 10:20 UTC; -5 points) (
- 7 Apr 2011 16:14 UTC; -7 points) 's comment on Popperian Decision making by (
- 4 Apr 2012 6:18 UTC; -9 points) 's comment on Harry Potter and the Methods of Rationality discussion thread, part 14, chapter 82 by (

hmm… I feel even more confident about the existence of probability-zero statements than I feel about the existence of probability-1 statements. Because not only do we have logical contradictions, but we also have incoherent statements (like Husserl’s “the green is either”).

Can one form subjective probabilities over the truth of “the green is either” at all? I don’t think so, but I remember a some-months-ago suggestion of Robin’s about “impossible possible worlds,” which might also imply the ability to form probability estimates over incoherencies. (Why not incoherent worlds? One might ask.) So the idea is at least potentially on the table.

And then it seems obvious that we will forever, across all space and time, have no evidence to support an incoherent proposition. That’s as good an approximation of infinite lack of evidence as I can come up with. P(“the green is either”)=0?

If you assign 0 to logical contradictions, you should assign 1 to the negations of logical contradictions. (Particularly since your confidence in bivalence and the power of negation is what allowed you to doubt the truth of the contradiction in the first place.) So it’s strange to say that you feel safer appealing to 0s than to 1s.

For my part, I have a hard time convincing myself that there’s simply no (epistemic) chance that Graham Priest is right. On the other hand, assigning any value but 1 to the sentence “All bachelors are bachelors” just seems perverse. It seems as though I could only get that sentence wrong if I misunderstand it. But what am I assigning a probability to, if not the truth of the sentence

as I understand it?Another way of saying this is that I feel queasy assigning a nonzero probability to “Not all bachelors are bachelors,” (i.e., ¬(p → p)) even though I think it probably makes

somesense to entertain as a vanishingly small possibility “All bachelors are non-bachelors” (i.e., p → ¬p, all bachelors are contradictory objects).If a statement is logically inconsistent with itself, it should not be part of your hypothesis space, and thus should not be assigned a probability at all.

One answer would be that an incoherent proposition is not a proposition, and so doesn’t have any probability (not even zero, if zero is a probability.)

Another answer would be that there is some probability that you are wrong that the proposition is incoherent (you might be forgetting your knowledge of English), and therefore also some probability that “the green is either” is both coherent and true.

It’s difficult to assign probability to incoherent statements, because since we can’t mean anything by them, we can’t assert a referent to the statement—in that sense, the probability is indeterminate (additionally, one could easily imagine a language in which a statement such as “the green is either” has a perfectly coherent meaning—and we can’t say that’s not what we meant, since we didn’t mean anything). Recall also that each probability zero statement implies a probability one statement by its denial and vice versa, so one is equally capable of imagining them, if in a contrived way.

Putting this in a slightly more coherent way. (I was having some trouble understanding the explanation, so I broke it down into layman’s terms, might make it more easily understandable)

If I assign P(0) to “Green is either” Then I assign P(1) to the statement “Green is not either”

If you assign absolute certainty to any one statement you are, by definition assigning absolute impossibility to all other possibilities.

Putting this in a slightly more coherent way. (I was having some trouble understanding the explanation, so I broke it down into layman’s terms, might make it more easily understandable)

If I assign P(0) to “Green is either” Then I assign P(1) to the statement “Green is not either”

If you assign absolute certainty to any one statement you are, by definition assigning absolute impossibility to all other possibilities.

j.edwards, I think your last sentence convinced me to withdraw the objection—I can’t very well assign a probability of 1 to ~”the green is either” can I? Good point, thanks.

that anecdote wasn’t amusing at all.

and it wasn’t an anecdote.

and it doesn’t prove the point. all it shows is that a single person didn’t know their 17 times tables off the top of their head. there’s no reason to expect someone to be as confident that 51 is or is not prime than 7 is or is not prime—and anyway, the point of the story should have been that, eventually, 7 might NOT be prime. which it’s always going to be.

i didn’t get it.

Probabilities of 0 and 1 are perhaps more like the perfectly massless, perfectly inelastic rods we learn about in high school physics—they are useful as part of an idealized model which is often sufficient to accurately predict real-world events, but we know that they are idealizations that will never be seen in real life.

However, I think we can assign the primeness of 7 a value of “so close to 1 that there’s no point in worrying about it”.

Perhaps the only appropriate uses for probability 0 and 1 are to refer to

logical contradictions(eg P & !P) and tautologies (P → P), rather than real-world probabilities?In stark contrast to this time last week, I now internally believe the title of this post.

I did enjoy “something, somewhere, is having this thought,” Paul, despite all its inherent messiness.

‘Green is either’ doesn’t tell us much. As far as we know it’s a nonsensical statement, but I think that makes it

morebelievable than ‘green is purple’, which makes sense, but seems extremely wrong. You might as well try to assign a probability to ‘flarg is nardle’. I can demonstrate that green isn’t purple, but not that green isn’t either, nor that flarg isn’t nardle.Is there anything truer than ‘7 is prime’? What’s the truest statement anyone can come up with? Can we definitely get no closer to 0 than 1, based on J Edwards & Paul, above?

I think you can still have probabilities sum to 1: probability 1 would be the theoretical limit of probability reaching infinite certitude. Just like you can integrate over the entire real line, i.e -∞ to ∞ even though those numbers don’t actually exist.

Easy: it’s a demonstration of how you can never be certain that you haven’t made an error even on the things you’re really sure about.

It’s a cheap, dirty demonstration, but one nevertheless.

You seem to think probabilities of 0 and 1 are mysterious or contradictory when discussing randomness; they aren’t. When you’re talking about randomness, you need to define your support. that mere action gives you places where the probability is zero. For example: Can the time to run 100m ever be negative? No? Then P(t=0) = 1.

No puzzle there. But you’re transfrormation to log-odds has some regularity conditions you’re violating in those cases: the transform is only defined for probabilities in (0,1). But that doesn’t mean log-odds or probabilities are flawed. Probabilities or 0 and 1 -- like log-odds of plus-and-minus infinity—are just filling in the boundaries on the system you’ve created. Mathematically, you want to be able to handle limits; that means handling limits as a probability approaches 0 or 1. That’s it.

This shouldn’t be some huge philosophical puzzle; it’s merely the need to have any mathematical system you use be complete. Sir David Cox would be the first to tell you that.

We certainly can talk about the limit of a function whose codomain is a measure of probability being 1; the limit of the probability of a proposition as the amount of evidence in favor of it approaches infinity is 1. But that doesn’t mean that 1 is a measure of probability. Infinity is valid as the limit of a function yielding real numbers, but infinity is not a real number.

As for your example with the amount of time it takes to run a particular distance, I can’t be certain that we won’t find a region of space with strange temporal effects that allow you to take a walk and arrive at your starting point before you left. This would allow you to run a hundred meters in negative time, in at least one sense of the word. Getting that sort of speed from the runner’s point of view would be stranger, but the Dark Lords of the Matrix could probably make it happen.

Cumulant—can you state, with infinite certainty, that no-one will ever run faster than light?

Well, it does seem like someone who travels back in time to reach the finish before he got there has… not actually followed the rules of the 100-meter dash.

By the current modelit is impossible for anything to move faster than light*, but what is your confidence in the current model? Certainly high, but not infinite. Lets not mix up the map and the territory. As forrunningfaster than light; certainly unlikely, but not infinitely so. If youdefinesomething as impossible in some model, and given that you want a probability within that model, or given that model, I don’t know what happens however...*With certain complications.

[Edit: Formating]

By the current modelit is impossible for anything to move faster than light*, but what is your confidence in the current model? Certainly high, but not infinite. Lets not mix up the map and the territory. As forrunningfaster than light; certainly unlikely, but not infinitely so. If youdefinesomething as impossible in some model, and given that you want a probability within that model, or given that model, I don’t know what happens however...*With certain complications.

Another way to think about probabilities of 0 and 1 is in terms of code length.

Shannon told us that if we know the probability distribution of a stream of symbols, then the optimal code length for a symbol X is: l(X) = -log p(X)

If you consider that an event has zero probability, then there’s no point in assigning a code to it (codespace is a conserved quantity, so if you want to get short codes you can’t waste space on events that never happen). But if you think the event has zero probability, and then it happens, you’ve got a problem—system crash or something.

Likewise, if you think an event has probability of one, there’s no point in sending ANY bits. The receiver will also know that the event is certain, so he can just insert the symbol into the stream without being told anything (this could happen in a symbol stream where three As are always followed by a fourth). But again, if you think the event is certain and then it turns out not to be, you’ve got a problem: the receiver doesn’t get the code you want to send.

If you refuse to assign zero or unity probabilities to events, then you have a strong guarantee that you will always be able to encode the symbols that actually appear. You might not get good code lengths, but you’ll be able to send your message. So Eliezer’s stance can be interpreted as an insistence on making sure there is a code for every symbol sequence, regardless of whether that sequence appears to be impossible.

But then, do you really want to build a binary transmitter that is prepared to handle not only sequences of 0 and 1, but also the occasional “zebrafish” and “Thursday” (imagine somehow fitting these into an electrical signal, or don’t, because the whole point is that it can’t be done)? Such a transmitter has enormously increased complexity to handle signals that, well… won’t ever happen. I guess you could say the probability is low enough that the expected utility of dealing with it is not worth it. But what about the chance that a “zebrafish” in the launch codes will wipe out humanity? Surely that expected utility cannot be ignored? (Except it can!)

Umm, it’s a real thing. ECC memory https://en.m.wikipedia.org/wiki/ECC_memory I’m sure it isn’t 100% foolproof (coincidentally the point of this article) but I imagine it reduces error probability by orders of magnitude.

Brent,

From what I understood on reading the Wikipedia article on Bayesian probability and inferring from how he writes (and correct me if I’m wrong), Eliezer is talking about your “subjective probability.” You are a being, have consciousness, and interpret input as information. Given a lot of this information, you’ve formed an idea that 7 is prime. You’ve also formed an idea that other people exist, and that the sky is blue, which also have a high subjective probability in your mind because you have a lot of direct information to sustain that belief.

Moreover, if you’ve ever been wrong before, hopefully you’ve noticed that you have been wrong before. That’s a little information that “you are sometimes wrong about things that you are very sure of”. So, you might apply this information to your formula of your probability of the idea that “7 is prime”, so you still end up with a high probability, but not 1.

Now, you might not think that “you are sometimes wrong about things that you are sure of” about every single subject, such as primeness. But, what if you had the information that other humans, smart people, have at some point in the past, incorrectly understood the primeness of a number (the anecdote). You might state, that “human beings are sometimes wrong about the primeness of a number,” and “I am a human being.” Again, if you include that information in your calculation of the probability that the idea that “7 is prime” is true, then you end up with a high probability, but not 1.

(Oh, but what if you didn’t make the statement “human beings are sometimes wrong about the primeness of a number”, but instead, “this idiot is sometimes wrong about the primeness of a number, but I am never” Well, you can. That’s one big problem with Bayesian subjective probabilities. How do we generalize? How can we formalize it so that two people with the same information deterministically get the same probability? Logical (or objective epistemic) probability attempts to answer these questions.)

So, you’re right that it is just “a single person” getting it wrong, that his cerainty was incorrect. But that’s Eliezer’s point. We are not supreme beings lording over all reality, we are humans who have memorized some information from the past and made some generalizations, including generalizations that sometimes our generalizations are wrong.

I agree with cumulant. The mathematical subject of probability is based on measure theory, which loses a ton of convergence theorems if we exclude 0 and 1. We can agree that things that are not known a priori can’t have probability 0 or 1, but I think we must also agree that “an impossible thing will happen soon” has probability 0, because it’s a contradiction. An alternate universe in which the number 7 (in the same kind of number system as ours, etc.) is prime is damn-near inconceivable, but an alternate universe in which impossible things are possible is purely absurd.

If our mathematical reasoning is coherent enough for it to be meaningful to make probability assignments then certainly we are not so fundamentally flawed that what we consider tautologies could be false. If you are willing to accept that maybe 0 is 1, then you can’t do any of your probability adjustments, or use Bayes’ Theorem, or anything of the sort without having a (possibly unstated) caveat that probability theory might be complete nonsense. But what’s the probability that probability theory is nonsense (i.e. false or inconsistent)? What does that even mean? We can only assign a probability if that makes sense, so conditioned on the sentence making sense, probability theory must be nonsense with probability 0, no? So averaged over all possible universes (those where probability theory makes sense, and those where it doesn’t) the sentence “probability makes sense with probability 1” better approximates the truth value of probability making sense than “probability makes sense with probability p” for p0. If it’s not, it’s still not worse, but what the hell are we even saying?

Speaking of measure theory, what probability should we assign to a uniformly distributed random real number on the interval [0, 1] being rational? Something bigger than 0? Maybe in practice we would never hold a uniform distribution over [0, 1] but would assign greater probability to “special” numbers (like, say,

^{1}⁄_{2}). But regardless of our probability distribution, there will exist subsets of [0, 1] to which we must assign probability 0.The only way I can see around this is to refuse to talk about infinite (or at least uncountable) sets. Are there others?

I suspect Eliezer would object to my post claiming that I’m confusing map and territory, but I don’t think that’s fair. If there’s a map you’re trying to use all over the place (and you do seem to), then I claim it makes no sense to put a little region on the map labelled “maybe this map doesn’t make any sense at all”. If the map seems to make sense and you’re still following it for everything, you’ll have to ignore that region anyway. So is it really reasonable to claim that “the probability that probability makes sense is <1″?

Utilitarian:

Measure theory gives a clear answer to this: it’s 0. Which is fine. For all x, the probability that your rv will take the value x is 0. Actually the probability that your rv is computable is also 0. (Computable numbers are the largest countable class I know of.) What’s false is the tempting statement that probability 0 events are impossible. It’s only the converse that’s true: impossible events have probability 0. There’s another tempting statement that’s false, namely the statement that if S is an arbitrary collection of disjoint events, the probability of one of them happening is the sum of the probabilities of each one happening. Instead, this only holds for countable sets S. This is part of the definition of a measure.

If there’s a map you’re trying to use all over the place (and you do seem to), then I claim it makes no sense to put a little region on the map labelled “maybe this map doesn’t make any sense at all”. If the map seems to make sense and you’re still following it for everything, you’ll have to ignore that region anyway.Janos, are you saying that it is

in factimpossible that your mapin factdoesn’t make any sense? Because I do, indeed, have a little section of my map labelled “maybe this map doesn’t make any sense at all”, and every now and then, I think about it a little, because there are so many fundamental premises of which I am unsure even in their definitions. (E.g: “the universe exists”, and “but why?”) Just because this area of my map drops out of my everyday decision theory due to failure to generate coherent advice on preferences, does not mean it is absent from my map. “You must ignore” or rather “You should usually ignore” is decision theory, and probability theory should usually be firewalled off from preferences.Computable numbers are the largest countable class I know of.Either all countable sets are the same size anyway, or you can generate a larger set by saying “all computable reals plus the halting probability”. How about computable with various oracles?

What’s false is the tempting statement that probability 0 events are impossible. It’s only the converse that’s true: impossible events have probability 0.If you cannot repose probability 1 in the statement “all events to which I assign probability 0 are impossible” you should apply a correction and stop reposing probability 0 to those events. Do you mean to say that all impossible events have probability 0, plus some more possible events also have probability 0? This makes no sense, especially as a justification for using “probability 0″ in a meaningfully calibrated sense.

To use “probability 0” without a finite expectation of being infinitely surprised, you must repose probability 1 in the belief that you use “probability 0″

onlyfor actually impossible events; but not necessarily believe that you assign probability 0 toeveryimpossible event (satisfying both conditions implies logical omniscience).I should mention that I’m also an infinite set atheist.

I can admit the possibility that probability doesn’t work, but not have to do anything about it. If probability doesn’t work and I can’t make rational decisions, I can expect to be equally screwed no matter what I do, so it cancels out of the equation.

The definable real numbers are a countable superset of the computable ones, I think. (I haven’t studied this formally or extensively.)

If you don’t want to assume the existence of certain propositions, you’re asking for a probability theory corresponding to a co-intutionistic variant of minimal logic. (Cointuitionistic logic is the logic of affirmatively false propositions, and is sometimes called Popperian logic.) This is a logic with false, or, and (but not truth), and an operation called co-implication, which I will write a <-- b.

Take your event space L to be a distributive lattice (with ordering <), which does not necessarily have a top element, but does have dual relative pseudo-complements. Take < to be the ordering on the lattice. (a <-- b) if for all x in the lattice L,

for all x, b < (a or x) if and only if a <-- b < x

Now, we take a probability function to be a function from elements of L to the reals, satisfying the following axioms:

P(false) = 0

if A < B then P(A) ⇐ P(B)

P(A or B) + P(A and B) = P(A) + P(B)

There you go. Probability theory without certainty.

This is not terribly satisfying, though, since Bayes’s theorem stops working. It fails because

conditional probabilitiesstop working—they arise from a forced normalization that occurs when you try to construct a lattice homomorphism between an event space and a conditionalized event space.That is, in ordinary probability theory (where L is a Boolean algebra, and P(true) = 1), you can define a conditionalization space L|A as follows:

L|A = { X in L | X < A } true’ = A false’ = false and’ = and or’ = or not’(X) = not(X) and A P’(X) = P(X)/P(A)

with a lattice homomorphism X|A = X and A

Then, the probability of a conditionalized event P’(X|A) = P(X and A)/P(A), which is just what we’re used to. Note that the definition of P’ is forced by the fact that L|A must be a probability space. In the non-certain variant, there’s no unique definition of P’, so conditional probabilities are not well-defined.

To regain something like this for cointuitionistic logic, we can switch to tracking degrees of disbelief, rather than degrees of belief. Say that:

D(false) = 1

for all A, D(A) > 0

if A < B then D(A) >= D(B)

D(A or B) + D(A and B) = D(A) + D(B)

This will give you the bounds you need to let you need to nail down a conditional disbelief function. I’ll leave that as an exercise for the reader.

Hi guys you don’t know me and I prefer to stay anonymous. I look at it backwards and get the very same result as Eliezer Y. What is total degeneracy? In practice, it is being total impervious to updating, regardless of the magnitude of the information seen (even infinity). That can only be achieved by unitary of nul probabilities as priors. Bayesian updating never takes you there (posteriors). And no updating can take place from that situation. Anonymous

If the map seems to make sense and you’re still following it for everything, you’ll have to ignore that region anyway.Just cos it’s not a very nice place to visit, doesn’t mean it ain’t on the map. ;)

“1, 2, and 3 are all integers, and so is −4. If you keep counting up, or keep counting down, you’re bound to encounter a whole lot more integers. You will not, however, encounter anything called “positive infinity” or “negative infinity”, so these are not integers.”

This bothered me, more to the point, it hit on some stuff I’ve been thinking about. I realize I don’t have a very good way to precisely state what I mean by “finite” or “eventually”

The above, for instance, basically says “if infinity is not an integer, then if I start at an integer and move an integer number of steps away from it, I will still be at an integer that’s not infinity, therefore infinity isn’t an integer”

But if we allowed infinity to be considered an integer, then we allow an infinite number of steps...

How about this: if N is a non infinite integer, SN is N’s successor, PN is N’s predecessor, neither SN nor PN will be infinite. Great, no matter where we start from, we can’t reach an infinity in one step, so that seems to make this notion more solid.

but… if N is an infinity, then neither SN nor PN (thinking about ordinals now, btw, instead of cardinals) will be finite. Doh.

So the situation seems a bit symmetric here. This is really annoying to me.

I have as of late been getting the notion that the notions of “finite” and “eventually” are so tied to the idea of mathematical induction that it’s probably best do define the former in terms of the latter… ie, the number of steps from A to A is finite if and only if induction arguments starting from A and going in the direction toward B actually validly prove the relevant proposition for B.

This is a vague notion, but near as I can tell, it comes closes to what I actually think I mean when I say something like “finite” or “eventually reach in a finite number of steps” or something like that.

ie, finite values are exactly those critters for which mathematical induction arguments can be used on. (maybe this is a bad definition. I’m more stating it as a “here’s my suspicion of what may be the best basis to really represent the concept”)

Anyways, as far as 0,1 not being probabilities… While I agree that one should’t believe a proposition with probability 0 or 1, I’m not sure I’d consider them nonprobabilities. Perhaps “unreachable” probabilities instead. Disallowing stuff like sum to 1 normalizations and so on would seem to require “unnatural” hoops to jump through to get around that.

Unless, of course, someone has come up with a clean model without that. (If so, well, I’m curious too.)

Eliezer:

I’m not sure what an “infinite set atheist” is, but it seems from your post that you use different notions of probability than what I think of as standard modern measure theory, which surprises me. Utilitarian’s example of a uniform r.v. on [0, 1] is perfect: it must take some value in [0, 1], but for all x it takes value x with probability 0. Clearly you can’t say that for all x it’s impossible for the r.v. to take value x, because it must in fact take one of those values. But the probabilities are still 0. Pragmatically the way this comes out is that “probability 0″ doesn’t imply impossible. If you perform an experiment countably-infinitely many times with the probability of a certain outcome being 0 each time, the probability of ever getting that outcome is 0; in this sense you can say the outcome is almost impossible. However it’s possible that each outcome individually is almost impossible, even though of course the experiment will have an outcome.

You can object that such experiments are physically impossible e.g. because you can only actually measure/observe countably many outcomes. That’s fine; that just means you can get by with only discrete measures. But such assumptions about the real world are not known a priori; I like usual measure theory better, and it seems to do quite a good job of encompassing what I would want to mean by “probability”, certainly including the discrete probability spaces in which “probability 0″ can safely be interpreted to mean “impossible”.

You’re right, it’s not that hard to come up with larger countable classes of reals than the computables; I just meant that all of the usual, “rolls-off-the-tip-of-your-tongue” classes seem to be subsets of the computables. But maybe Nick is right, and the definables are broader. I haven’t studied this either.

And yes, I also sometimes think about how assumptions I make about life and the perceptible universe could be wrong, but I do not do this much for mathematics that I’ve studied deeply enough, because I’m almost as convinced of its “truth” as I am of my own ability to reason, and I don’t see the use in reasoning about what to do if I can’t reason. This is doubly true if the statements I’m contemplating are nonsense unless the math works.

Eliezer:

I am curious as to why you asked Peter not to repeat his stunt.

Also, I would really like to know how confident you are in your infinite set atheism and for that matter in your non-standard philosophy of mathematics attitudes in general.

Regarding infinite set atheism:

Is the set of “possible landing sites of a struck golf ball” finite or infinite?

In other words, can you finitely parameterize locations in space? Physicists normally model “position” as n-tuples of real numbers in a coordinate system; if they were forced to model position discretely, what would happen?

I can claim to see an infinite set each time I use a ruler...

Doug S., I believe according to quantum mechanics the smallest unit of length is Planck length and all distances must be finite multiples of it.

Eliezer:

I should mention that I’m also an infinite set atheist.You’ve mentioned this before, and I have always wondered: what does this mean? Does it mean that you don’t believe there are any infinite sets? If so, then you have to believe that a mathematician who claims the contrary (and gives the standard proof) is making a mistake somewhere. What is it?

Frankly, even if you actually are a finitist (which I find hard to imagine), it doesn’t seem relevant to this disucssion: every argument you have presented could equally well have been given by someone who accepts standard mathematics, including the existence of infinite sets.

The nature of 0 & 1 as limit cases seem to be fascinating for the theorists. However, in terms of ‘Overcoming Bias’, shouldn’t we be looking at more mundane conceptions of probability ? EY’s posts have drawn attention to the idea that the amount of information needed to add additional cetainty on a proposition increases exponentially while the probability increases linearly. This says that in utilitarian terms, not many situations will warrant chasing the additional information above 99.9% certainty (outside technical implementations in nuclear physics, rocket science or whatever). 99.9% as a number is taken out of a hat. In human terms, when we say ‘I’m 99.9% sure that 2+2 always =4’, where not talking about 1000 equivalent statements. We’re talking about one statement, with a spatial representation of what ’100% sure’ means with respect to that statement, and 0.1% of that spatial representation allowed for ‘niggling doubts’, of the sort : what have I forgotten ? What don’t I know ? What is inconceivable for me ? The interesting question for ‘overcoming bias’ is : how do we make that tradeoff between seeking additional information on the one hand and accepting a limited degree of certainty on the other ? As an example (cf. the Evil Lords of the Matrix), considering whether our minds are being controlled by magic mushrooms from Alpha Pictoris may someday increase the ‘niggling doubt’ range from 0.1% to 5%, but the evidence would have to be shoved in our faces pretty hard first.

Doug S., I believe according to quantum mechanics the smallest unit of length is Planck length and all distances must be finite multiples of it.Not in standard quantum mechanics. Certain of the many

~~theories~~unsupported hypotheses of quantum gravity (such as Loop Quantum Gravity) might say something similar to this, but that doesn’t abolish every infinite set in the framework. The total number of “places where infinity can happen” in modern models has tended to increase, rather than decrease, over the centuries, as models have gotten more complex. One can never prove that natureisn’t“allergic to infinities” (the skeptic can always claim, “wait, but if we looked even closer or farther, maybe we would see a heretofore unobserved brick wall”), but this allergy is not something that has been empirically observed.I think Eliezer’s “infinite set atheism” is a belief that infinite sets, although well-defined mathematically, do not exist in the “real world”; in other words, that any physical phenomenon that actually occurs can be described using a finite number of bits. (This can include numbers with infinite decimal expansions, as long as they can be generated by a finitely long computer program. Therefore, using pi in equations is not prohibited, because you’re using the symbol “pi” to represent the program, which is finite.)

A consequence of “infinite set atheism” seems to be that the universe is a finite state machine (although one that is not necessarily deterministic). Am I understanding this properly?

What do you mean by “infinite set atheism”? You are essentially stating that you don’t believe in mathematical limits—because that is one of the major consequences of infinite sets (or sequences).

## If you don’t believe in those… well, you lose calculus, you lose the density of real numbers, you lose the need or understanding of man events with probability 0 or 1, and you lose the point of Zeno’s Paradox.

Janos is spot on about measure zero not implying impossibility. What is the probability of a golf ball landing at any exact point? Zero. But it has to land somewhere, so no one point is impossible.

Impossibility would mean absence from your sigma algebra. What’s that you ask? Without making this painful, you need three things for probability: an idea of what constitutes “the space of everything”, an idea of what constitutes possible events out of that space which we can confirm or deny, and an assignment of numbers to those events. (This is often LaTeX’ed as (\Omega, \mathcal{F}, P).) The conversation here seems to be confusing the filtration/sigma-algebra F with the numbers assigned to those events by P.

Can we choose which we’re talking about: events or numbers?

Wrong.

I don’t know which is more painful: Eliezer’s errors, or those of his detractors.

Perhaps you could clarify what exactly is an infinite set atheist in a full post...or maybe it’s only worth a comment.

Cumulant, I think the idea behind “infinite set atheism” is not that limits don’t exist, but that that infinities are acceptable

onlyas limits approached in a specified way. On this view, limits are not aconsequenceof infinite sets, as you contend; rather, only the limit exists, and the infinite set or sequence is merely a sloppy way of thinking about the limit.Eliezer, I’ll second Matthew’s suggestion above that you write a post on infinite set atheism; it looks as if we don’t understand you.

I

thinkI understand the motive for rejecting infinite sets (viz., that whenever you deal with infinites you get all sorts of ridiculously counterintuitive results—sums coming out different when you reärrange the terms, the Banach-Tarski paradox,&c.,&c.), but I’m not sure you can give up infinite sets withoutalsogiving up the real numbers (as others have touched on above), which seemsverywrong.Caledonian: Not wrong. Take the field you’re swinging at to be a plane. There are infinitely many points in that plane; that’s just the density of the reals.

Now say there is some probability density of landing spots; and, let’s say no one spot is special in that it attracts golf balls more than points immediately nearby (i.e. our pdf is continuous and non-atomic). Right there, you need every point (as a singleton) to have measure 0.

Go pick up Billingsley: measure 0 is not the same as impossible nor does it cause any problems.

And the location that the ball lands on will

alsobe composed of infinitely many reals. Shall we compare the size of two infinite sets?I’d say that the ball is a sphere and consider the first point of impact (i.e. the tangency point of the plane to the sphere). Otherwise, you need to know a lot about the ball and the field where it lands.

You can compare infinite sets. Take the sets A and B, A={1,2,3,...} and B={2,3,4,...}. B is, by construction, a subset of A. There’s your comparison; yet, both are infinite sets.

What assumptions would you make for the golf ball and the field? (To keep things clear, can we define events and probabilities separately?)

Caledonian, every undergraduate who has ever taken a statistics class knows that the probability of any single point in a continuous distribution is zero. Probabilities in continuous space are measured on intervals. Basic calculus...

I believe according to quantum mechanics the smallest unit of length is Planck length and all distances must be finite multiples of it.This is what I’m given to understand as well. Doesn’t this take the teeth out of Zeno’s paradox?

Pragmatically the way this comes out is that “probability 0” doesn’t imply impossible.Janos, would you agree that P=0 is a probability to the same degree that infinity is a number? Apologies for double post.

Gowder, everyone who’s ever given the issue more than three-seconds’-thought knows that no statistical result ever involves a single point.

Usually, if a die lands on edge we say it was a spoiled throw and do it over. Similarly if a Dark Lord writes 37 on the face that lands on top, we complain that the Dark Lord is spoiling our game and we don’t count it.

We count 6 possibilities for a 6-sided die, 5 possibilities for a 5-sided die, 2 possibilities for a 2-sided die, and if you have a die with just one face—a spherical die—what’s the chance that face will come up?

I think it would be interesting to develop probability theory with no boundaries, with no 0 and 1. It works fine to do it the way it’s done now, and the alternative might turn up something interesting too.

Ben:

Well, that depends on your number system. For some purposes +infinity is a very useful value to have. For instance if you consider the extended nonnegative reals (i.e. including +infinity) then every measurable nonnegative extended-real-valued function on a measure space actually has a well-defined extended-nonnegative-real-values integral. There are all kinds of mathematical structures where an infinity element (or many) is indispensable. It’s a matter of context. The question of what is a “number” is I think very vague given how many interesting number-like notions mathematicians have come up with. But unquestionably “infinity” is not a natural number, or a real number, or a complex number.

Probability theory, on the other hand, would have to change shape if we comfortably wanted to exclude 0 probabilities. What we now call measures would be wrong for the job. I don’t know how it would look, but I find the standard description intuitively appealing enough that I don’t think it should be changed. It’s probably true that for a Bayesian inference engine of some sort, whose purpose is to find likelihoods of propositions given evidence, the “probabilities” it keeps track of shouldn’t become 0 or 1. If there’s a rich theory there focussing on how to practically do this stuff (and I bet there is, although I know nothing of it beyond Bayes’ Theorem, which is a simple result) then ignoring the possibility of 0s and 1s makes sense there: for example you can use the log odds. But in general probability theory? No.

I think it would be interesting to develop probability theory with no boundaries, with no 0 and 1. It works fine to do it the way it’s done now, and the alternative might turn up something interesting too.

You might want to check out Kosko’s Fuzzy Thinking. I haven’t gone any further into fuzzy logic, yet, but that sounds like something he discussed. Also, he claimed probability was a subset of fuzzy logic. I intend to follow that up, but there is only one of me, and I found out a long time ago that they can write it faster than I can read it.

“On some golf courses, the fairway is readily accessible, and the sand traps are not. The green is either.”

Haha, very nice CGD. Shows how much those philosophers of language know about golf. :-)

Although… hmm… interesting. I think that gives us a way to think about another probability 1 statement: statements that occupy the entire logical space. Example: “either there are probability 1 statements, or there are not probability 1 statements.” That statement seems to be true with probability 1...

Disallowing a symbol for “all events” breaks the definition of a probability space. It’s probably easier to allow extended reals and break some field axioms than figure out do rigorous probability without a sigma-algebra.

When re-working this into a book, you need to double check your conversions of log odds into decibels. By definition, decibels are calculated using log base 10, but some of your odds are natural logarithms, which confused the heck out of me when reading those paragraphs.

Probability .0001 = −40 decibels (This is the only correct one in this post, all “decibel” figures afterwards are listed as 10 * the natural logarithm of the odds.) Probability 0.502 = 0.035 decibels Probability 0.503 = 0.052 decibels Probability 0.9999 = 40 decibels Probability 0.99999 = 50 decibels

P.S. It’d be nice if you provided an RSS feed for the comments on a post, in addition to the RSS feed for the posts...

I cannot begin to imagine where those numbers came from. Dangers of “Posted at 1:58 am”, I guess. Fixed.

Could you respond to Neel Krishnaswami’s post above, and this one as well?

P(A&B)+P(A&~B)+P(~A&B)+P(~A&~B)=1

Isn’t the “1” above a probability?

My intution as a mathematician declares that nobody will never develop an elegant mathematical formulation of probability theory that does not allow for statements that are logically impossible or certain, such as statements of the form

pAND NOTp. And it is necessary, if the theory is to be isomorphic to the usual one, that these statements have probability 0 (if impossible) or 1 (if certain). However, I believe that it is quite reasonable to declare, as a condition demanded of any prior deemed rational, thatonlytruly impossible or certain statements have those probabilities. I think that this gives you what you want.It’s obvious that you can make this very demand when working with discrete probability distributions. It may not be obvious that you can make this demand when working with continuous probability distributions. Certainly the usual theory of these, based on so-called ‘measure spaces’ and ‘σ-algebras’ (I mention those in case they jog the reader’s memory), cannot tolerate this requirement, at least not if anything at all similar to the usual examples of continuous distributions are allowed.

One answer is that only discrete probability distributions apply to the

realworld, in which one can never make measurements with infinite precision or observe an infinite sequence of events. Even if the world has infinite size or is continuous to infinitesimal scales, you will never observe that, so you don’t need to predict anything about that.However, even if you don’t buy this argument, never fear! There is a mathematical theory of probability based on ‘pointless measure spaces’ and ‘abstract σ-algebras’. In this theory, it again makes perfect sense to demand that any prior must assign probability 0 or 1 only to impossible or certain events. The idea is that if something can never be observed, even in principle, then it is effectively impossible, and the abstract pointless theory allows one to treat it as such.

Then I agree that one should require, as a condition on considering a prior to be rational, that it should assign probability 0 only to these impossible events and assign probability 1 only to their certain complements.

PS: cumulant-nimbus above gives a brief summary of the usual approach to measure theory. The pointless approach that I advocate can be suggested from that as follows: taboo \Omega. Neel Krishnamurti’s comment is implicitly using the pointless approach; his event space is cumulant-nimbus’s \mathcal{F}, and he works entirely in terms of events.

As Perplexed points out this is usually known as Cromwell’s_rule.

Thanks for the link. It sounds like Yudkowsky is arguing something quite close to Cromwell’s Rule, with a slight technical difference. From the Wikipedia article:

Yudkowsky would argue that formal logic is not part of the territory, but rather part of our map (perhaps surveying equipment would be a good analogy, since the compass analogy is already taken by “moral compass”). As such, not even formal mathematical logic should be presumed to have 100% certainty.

Of course, this raises the problem of constantly having to include the term p(math is fundamentally flawed) everywhere. instead of just writing p(heads) when calculating the odds of a coin flip or flips, now we’d have to use p(heads | ~math is fundamentally flawed). As a matter of sheer convenience, it would be easier to just add it to the list of axioms supporting the fundamental theorems that the rest of mathematics is built on.

But that’s just semantics, I suppose. Wikipedia has a couple more interesting tidbits, that I’ve fished out for future readers:

I’m kinda surprised that it’s only been mentioned once in the comments (I only just discovered this site, really really great, by the way) and one from 2010 at that, but it seems to me that “a magical symbol to stand for “all possibilities I haven’t considered” ” does exist: the symbol “~” (i.e. not). Even the commenter who does mention it makes things complicated for himself: P(Q or ~Q)=1 is the simplest example of a proposition with probability 1.

The proposition is of course a tautology. I do think (but I’m not sure) that that is the only sort of statement that receives probability 1. This is in sync with Eliezer’s “amount of evidence” interpretation. A bayesian update can only generate 1 if the initial proposition was of probability 1 or if the evidence was tautological (i.e. if Q then Q or, slightly less lame, if “Q or R” and “~R” then Q, where “Q or R” and “~R” are the evidence).

Skimming the comments, I saw two other proposals for “sure bets”, the runner who clocked a negative time and the golf ball landing in a particular spot. That last one degenerated pretty quickly into a discussion about how many points there are in a field and on a ball. I think that’s typical of such arguments: it depends on your model. Once you have your model specified the probability becomes 1 (or not) if the statement is (or isn’t) tautological in the model. If the model isn’t specified, then neither is the statement (what is a precise point?) and hence the probability. Ask the next man what the probability is of a runner clocking a negative time and he’ll rightly respond: “Huh?” (unless he is a particularly obfuscatory know-it-all, in which case he might start blabbering about the speed of light. But then too, he makes a claim because he can ascribe meaning to the question, that is, he picks his model). So these are also tautological examples.

I think Eliezer’s hold up pretty well for proposition that aren’t tautological and hence empirical in nature: they require evidence and only tautological evidence will suffice for certainty.

About the problem of inserting 0′s in certain standard theorems: I don’t see a problem with Bayes’ theorem (I’m curious about other examples). Dividing by 0 is not defined, so the probability of it raining when hell freezes over is not defined. That seems like a satisfactory arrangement.

Jaynes avoids P(A|B) for “probability of A given evidence B” and P(B) for “probability of B”, preferring P(A|BX) and P(B|X) where X is one’s background knowledge. This and the above leads naturally to the question of ~X: the situation in which one’s “background knowledge” is false.

Assume that background knowledge X is the conjunction of a finite number of propositions. ~X is true if

anyof these propositions is false. If we can factor X into YZ where Y is the portion we suspect of being false — that is, if we can isolate for testing a portion of those beliefs we previously treated as “background knowledge” — then we can ask about P(A|BYZ) and P(A|B·~Y·Z).Thanks for the analysis, MathijsJ! It made perfect sense and resolved most of my objections to the article.

I was willing to accept that we cannot reach absolute certainty by accumulating evidence, but I also came up with multiple logical statements that undeniably seemed to have probability 1. Reading your post, I realized that my examples were all tautologies, and that your suggestion to allow certainty only for tautologies resolved the discrepancy.

The Wikipedia article timtyler linked to seems to support this: “Cromwell’s rule [...] states that one should avoid using prior probabilities of 0 or 1, except when applied to statements that are logically true or false.” This matches your analysis—you can only be certain of tautologies.

Also, your discussion of models neatly resolves the distinction between, say, a mathematically-defined die (which can be certain to end up showing an integer between 1 and 6) and a real-world die (which cannot quite be known for sure to have exactly six stable states).

Eliezer makes his position pretty clear: “So I propose that it makes sense to say that 1 and 0 are not in the probabilities; just as negative and positive infinity, which do not obey the field axioms, are not in the real numbers.”

It’s true—you cannot ever reach a probability of 1 if you start at 0.5 and accumulate evidence, just as you cannot reach infinity if you start at 0 and add integer values. And the inverse is true, too—you cannot accumulate evidence against a tautology and bring its probability down to anything less than 1. But this doesn’t mean a probability of 1 is an incoherent concept or anything.

Eliezer: if you’re going to say that 0 and 1 are not probabilities, you need to come up with a new term for them. They haven’t gone away completely just because we can’t reach them.

Edit a year and a half later: I agree with the article as written, partially as a result of reading How to Convince Me That 2 + 2 = 3, and partially as a result of concluding that “tautologies that have probability 1 but no bearing on reality” is a useless concept, and that therefore, “probability 1″ is a useless concept.

For any state of information X, we have P(A or not A | X) = 1 and P(A and not A | X) = 0. We

have tohave 0 and 1 as probabilities for probability theory even to work. I think you’re taking a reasonable idea—that P(A | X) should be neither 0 nor 1 when A is a statement about the concrete physical world—and trying to apply it beyond its applicable domain.Consider the set of all possible hypotheses. This is a countable set, assuming I express hypotheses in natural language. It is potentially infinite as well, though in practice a finite mind cannot accomodate infintely-long hypotheses. To each hypothesis, I can try to assign a probability, on the basis of available evidence. These probabilities will be between zero and one. What is the probability that a rational mind will assign at least one hypothesis the status of absolute certainty? Either this is one (there is definitely such a hypothesis), or zero (there is definitely not such a hypothesis, which cannot be, because the hypothesis “there is definitely not such a hypothesis” is then a counterexample), or somewhere in between (there may be, somewhere, a hypothesis that a rational mind would regard as being absolutely certain). So I cannot accept your hypothesis that there does not exist, anywhere, ever, a hypothesis that I should regard as being absolutely certain.

Self-referential hypotheses do not always map to truth values, and “a rational mind will assign at least one hypothesis the status of absolute certainty” is self-referential. The contradiction you’ve encountered arises from using a statement isomorphic to “this statement is false” and requiring it to have a truth value, not to a problem with excluding 0 and 1 as probabilities.

Yes 0 and 1 are not probabilities. They’re truth or falseness values. it’s necessary to make a third ‘truth value’ for things that are unprovable, and possibly a fourth for things that are untestable.

Digging up an old thread here, but an interesting point I want to bring up: a friend of mine claims that he internally assigns probability 1 (i.e. an undisprovable belief) only to one statement: that the universe is coherent. Because if not, then mnergarblewtf. Is it reasonable to say that even though no statement can actually have probability 1 if you’re a true Bayesian, it’s reasonable to internally establish an axiom which, if negated, would just make the universe completely stupid and not worth living in any more?

There’s a lot of logic to that. For extremely unlikely possibilities you can often get away with setting their probability to 0 to make the calculations a lot simpler. For possibilities where predicted utility is independent of your actions (like “reality is just completely random”) it can also be worthwhile setting their probability to 0 (ie. ignoring them), since they’re approximately a constant term in expected utility. These are good ways of approximating actual expected utility so you can still mostly make the right decisions, which bounded rationality requires.

What is P(A|A)?

What do you mean by “|A”? It’s well-defined in mathematics, sure, but in real life, surely the furthest you can go is “|experience/perception of evidence for A”.

Also, there’s also the probability that the particular version of logic you’re using is wrong.

How far you can go depends on what you mean by “go”.

It’s perfectly possible to calculate, say, P(I see the coin come up heads | the coin is flipped once, it is fair, and I see the outcome), and actually much more difficult to calculate P(I see the coin come up heads | I have experience/perception of evidence for the facts that the coin is flipped once, it is fair, and I see the outcome).

“I see” is what I meant by perception/experience of evidence. Whenever I “see” something, there’s always a non-zero chance of my brain deceiving me. The only thing you can really have to

base your decisions onis P(I seethe coin come up heads |I see/knowthe coin is flipped once,I knowit is fair, andI seethe outcome). P(the coin comes up heads|the coin is flipped once, it is fair and I know the outcome) is possible and easy to calculate, but not completely accurate to the world we live in.A charitable paraphrase of “The universe is coherent” could be a statement of the universal validity of non-contradiction: For every p, not (p and not p). However, given the existence of paraconsistent logic and philosophers who take dialethism seriously, I cannot assign probability 1 to the claim that no aspect of the universe requires a contradiction in its description.

I would go even further to say that I am quite more certain of many other claims (such as “1+1=2” and “2+2=4″) than of such general and abstract propositions as “the universe is coherent” or even “there are no true contradictions”.

I don’t think he goes quite that far—he assigns no statements probability 0 or 1 within our own logic system, even (P and ¬P), because he believes it to be possible (though not very likely) that some other logic system might supersede our own.

His belief is that it is not possible for ALL systems of logic to be incorrect, i.e. that (it is impossible to reason correctly about the universe) is necessarily false.

No, it’s not. It’s the same fundamental mistake that a lot of religious rhetoric about “faith” and “meaning” is founded on: that

wantingsomething to be true counts as evidence that itistrue. There’s no reason to think that the universe depends for any of its properties on whether someone finds it stupid or not, or worth living in.I’d also suggest you try to draw your friend out a bit on what it means exactly for the universe to be “coherent.” Can that notion be expressed formally? What would we expect to see if we lived in an incoherent universe?

Obviously, I’m dubious that the “coherence” of the universe is in any proper sense a philosophical or scientific idea—it sounds a lot more like an aesthetic one.

I think he just means “coherent” as “one which we can actually model based on our observations”, i.e. one in which this whole exercise (rationality) makes any sense.

He expects that the universe be incoherent with probability zero, and doesn’t think there would be any sensible observations if this were the case (or any observation being possible if this were the case).

ETA: Merriam-Webster Definition of COHERENT

1 a : logically or aesthetically ordered or integrated : consistent b : having clarity or intelligibility : understandable

So, understandable and consistent: a universe which philosophy, mathematics and science can apply to in any meaningful way.

Richard Jeffrey, “Probability and the art of judgement”.

I leave it as an exercise to correctly state the relationships between Eliezer’s article, the Jeffrey quote, and the value of P(A|A).

(Note: Jeffrey is not to be confused with Jeffreys, although both were Bayesian probability theorists.)

Interesting Log-Odds paper by Brian Lee and Jacob Sanders, November 2011.

“When you work in log odds, the distance between any two degrees of uncertainty equals the amount of evidence you would need to go from one to the other. That is, the log odds gives us a natural measure of spacing among degrees of confidence.”

That observation is so useful and intuition friendly it probably deserves it’s own blog post, and a prominent place in your book.

Forgive me if this sounds condescending, but isn’t saying “0 and 1 are not probabilities because they won’t let you update your knowledge” basically the same as saying “you can’t

knowsomething because knowing makes you unable to learn”? If we assign tautologies as having probability 1, then anything reducible to a tautology should have probability 1 (and similarly, all contradictions and things reducible to contradictions should have probability 0). For any arbitrarily large N, if you put 2 apples next to 2 apples and repeat the test N times, you’ll get 4 apples N out of N times, no less (discounting molecular breakdowns in the apples or other possible interferences).You shouldn’t assign tautologies probability 1 either because your notion of what a tautology is might be a hallucination.

This confuses object level and meta level. In probability theory, P(-A|A) = 0 and P(A|A) = 1, however uncertain you may be about Cox’s theorem, or about whether you are actually thinking about the same A each time it appears in those formulas. No-one, as far as I know, has ever constructed a theory of probability in which these are assigned anything else but 0 and 1. That is not to say that it cannot be done, only that it has not been done. Until that is done, 0 and 1 are probabilities.

The title of the article is a rhetorical flourish to convey the idea elaborated in its body, that to assert a probability, as a measure of belief, of 0 or 1 is to assert that no possible evidence could update that belief, that 0 and 1 are probabilities that you should not find yourself assigning to matters about which there could be any real dispute, and to suggest odds ratios or their logarithms as a better concept when dealing with practical matters associated with very low or very high probabilities. There is a very large difference between saying that the probability of winning a lottery is tiny and saying that it cannot happen at all; with enough participants it is almost certain to happen to someone. That difference is made clear by the log-odds scale, which puts the chance of a lottery ticket at 60 or more decibels below zero, not infinitely far below. In a world with 7 billion people, billion-to-1 chances happen every day.

As an example of even tinier probabilities which are still detectably different from zero, consider a typical computer. A billion transistors in its CPU, clocked a billion times a second, running for a conveniently round length of time, a million seconds, which is about 12 days. Computers these days can easily do that without a single hardware error, which means that for every one of a million billion billion switching events, a transistor opened or closed exactly as designed. A million billion billion is about 1.5 times Avogadro’s number. The corresponding log-odds is −240 decibels. And yet hardware glitches can still happen.

And P(A|A) is still 1, not any finite number of decibels.

So you are saying that statement “0 and 1 are not probabilities” has probability of 1?

Nope. He’s saying that based on his best analysis, it appears to be the case.

This is undefined for P = 1. If you claim that that function is a real-valued bijection between probabilities and odds then P = 1 doesn’t work so you’re begging the question. Always take care to not divide by zero.

Whether or not real-world events can have a probability of 0 or 1 is a different question than “are 0 and 1 probabilities?”. They most certainly are.

I agree with this one. Without probabilities of 0 and 1, it’s not merely that some proofs of theorems need to be revised, it’s that probability theory simply doesn’t work anymore, as its very axioms fall apart.

I can give a statement that is absolutely certain, e.g. “x is true given that x is true”. It doesn’t teach me much about real life experiences, but it is infinitely certain. Likewise with probability 0. Please note that the probability is assigned to the territory here, not the map.

The fact that I can’t encounter these probabilities in real life has to do with my limits of sampling reality and interpreting it, being a flimsy brain, rather than the limits of probability theory.

You may not want to believe that probability theory contains 0 and 1, but like many other cases, Math doesn’t care about your beliefs.

If I roll a die, then one of the events that can happen will happen. That’s just saying that if S is my sample space, then P(S) = 1. Similarly, P(~S) = 0, which is just saying that impossible things won’t happen. The former statement is an axiom in the standard mathematical treatments of the subject. These statements may be trivial, but I distrust any mathematics that can’t handle trivial cases.

Rejecting 1 as a probability would be catastrophic when you’re dealing with discrete spaces. If you’re the sort to reject infinity, then it would follow that all probability spaces are discrete. At that point probability loses its rigor. Preference for odds or log odds just means that you have to live with using the extended reals with special conventions for the infinities.

You can reject infinity without being able to enumerate every possibility. Your sample space will never practically contain all the possibilities. (How many times has something you never thought of happened?) There are 2^(however many bits of input come into my brain) possibilities for me to observe for any period of time, and I can never think about all of them. Any explicit sample space is going to miss possibilities. S is not well-defined.

I think the point of the post was that 1 shouldn’t be used for practical cases.

Real life is complex enough that there is merit to the philosophical position that one should refrain from assigning probabilities of 0 or 1 to nontrivial events. Categorically denying that any event can have probability 0 or 1 is an extreme position (which, applied to itself, would really mean that a given event would have a high probability of not occurring with probability 0 or 1).

From the purely mathematical standpoint, removing 0 and 1 from the set of possible probabilities breaks the current foundations of the theory. The existence of a sample space containing all possibilities does not depend on whether we humans can comprehend them all. If the sample space of all possibilities exists and P(S) < 1, then a lot of theorems break down. That’s where you live with idealizations like absolute certainty (or almost certainty in the infinite case) or else find something other than probability to use to model the real world.

In theory, if you could list every possible observation you could make, that will have a 1 probability. It would take infinite time, because the following class of outcomes:

has an infinite cardinality. I could get into how Godel means you can’t even in principle describe all possible outcomes in a finite amount of space, even by referencing classes like I did, but I’ll leave that up to you.

There was a suggested fix to your problem in the post, why isn’t that good enough for you?

Sounds like he agrees that S has probability 1.

Note: I agree that the way he “proves” the claim is not very good. He basically tries to switch your intuition by switching the wording of the question. Not too rigorous.

When I say that the possibilities can be listed in principle, what I mean is that there some set S that contains them and make no reference to any practical problems with describing or storing its elements. Like the points and lines of geometry, it’s a Platonic idealization.

Because talk of magical symbols is a good sign that the passage was meant to ridicule the use of infinity. The very next paragraph seeks to expunge such “magical symbols” from probability theory.

If he has a rigorous way to ground probability theory without 0 and 1, I’m fine with it. He seemed to be saying that he wishes there was such a way, but until someone develops one, he’s stuck with magical symbols. He acknowledges all your problems in the end of the post.

This article is largely incoherent. The main justification is the abuse of an invalid transformations: y=x/(1-x) is not the bijection that he asserts it is, because it’s not a function that maps [0,1] onto R. It’s a function that maps [0,1] onto [1,\intfy] as a subset of the topological closure of R. And that’s okay, but you can’t say “well I don’t like the topological closure of R, so I’ll just use R and claim that 1 is where the problem is.”

Additionally, his discussion of log odds and such is perfectly fine, but ignores the fact that there are places where you do need to have an odds of 0:1, or a log odds of negative infinity. Probability theory stops working when you throw out 0 and 1, it’s as simple as that.

Even if you don’t want to handle tautologies or contradictions, there are other ways to get P(X)=0 or 1. The probability that a real number chosen uniformly from the real interval [0,1] is 0. It has to be. It’s a provable fact under ZFC and to decide otherwise is to say that you’re more attached to the idea of 0 and 1 not being probabilities than you are to the fact that mathematics is consistent and if you really believe that, well, there’s absolutely nothing I have to say to you.

This is one of those situations where EY just demonstrates he knows very little mathematics.

How is that not a bijection? Specifically, a bijection between the sets [0,1[∪{1} and IR≥0∪{∞}, which seems exactly to be the claim EY is making.

On a broader point, EY was not calling into question the correctness or consistency of mathematical concepts or claims but whether they have any useful meaning in reality. He was not talking about the map, he was talking about the territory and how we may improve the map to better reflect the territory.

As someone who doesn’t know much beyond basic statistics, in what way are 0 or 1 probabilities? Isn’t it just axiomatic truth at that point? In that sense saying zero and one are probabilities is just saying ‘certain’ or ‘impossible’ as far as I understand it. Situations where an event will definitely or definitely not occur doesn’t seem to be consistent with the idea of randomness which I’ve understood probability to revolve around.

I suppose the alternative would be that we’d have to assume every mathematical proof has infinite evidence if we wanted to get anywhere productive- after all axioms are assumed to be true. It doesn’t make much sense to need evidence in that scenario- except perhaps the probability of error and mistake? That isn’t particularly calculable and would actually change from person to person.

Using one and zero makes sense to me as a matter of assumed or proven truths, but I’m still unsure how that makes it a probability.

Formally, probability is defined via areas. The basic idea is that the probability of picking an element from a set A out of a set B is the ratio of the areas of A to B, where “area” can be defined not only for things like squares but also things like lines, or actually almost every* subset of R. So, lets say you want to randomly select a real number from the interval [0,1] and want to know the odds it falls in a set, S. The area of [0,1] is 1, so the answer is just the area of S.

If S={0}, then S has area zero. If S=[0,1), then S has area 1. Not only are both of these theoretical possibilities, they are practical ones too. There are real world examples of probability zero events (the only one that comes to mind involves QM though so I don’t want to bother with the details).

Now, notice that this isn’t the same thing as “impossible”. Instead, it means more like “it won’t happen I promise even by the time the universe ends”. The way I tend to think about probability zero events is that they are so unlikely they are beyond the reach of the principle that as the number of trials increases, events become expected. For any nonzero probability, there is a number of trials, n, such that once you do it n times the expected value becomes greater than 1. That’s not the case with probability zero events. Probability 1 events can then be thought of as the negation of probability 0 events.

*not actually “almost every” in a formal sense, but “almost any” in a “unless you go try to build a set that you can’t measure it probably has a well defined area” sense

That seems a solid enough explanation, but how can something of probability zero have a chance to occur? How then do you represent an impossible outcome? It seems like otherwise ‘zero’ is equivalent to ‘absurdly low’. That doesn’t quite jive with my understanding.

Impossible things also have a probability of zero. I totally understand that this seems a bit unintuitive, and the underlying structure (which includes things like infinities of different sizes) is generally pretty unintuitive at first. Which is kinda just saying “sorry, I can’t explain the intuition,” which is unfortunately true.

I’m just going to think of it as taking the limit as evidence approaches infinity. Because a probability next to zero and zero are identical, zero then is a probability?

I think one of the clearest expositions on these issues is ET Jaynes. The first three chapters (which is some of the relevant part) can be found at http://bayes.wustl.edu/etj/prob/book.pdf.

“Not Found

The requested URL /etj/prob/book.pdf. was not found on this server.”

Fixed Jaynes link (no trailing period).

Oops. Thanks for the fix!

Ah. Thanks!

“Event” is a very broad notion. Let’s say, for example, that I roll two dice. The sample space is just a collection of pairs (a, b) where “a” is what die 1 shows and “b” is what die 2 shows. An event is any sub-collection of the sample space. So, the event that the numbers sum to 7 is the collection of all such pairs where a + b = 7. The probability of this event is simply the fraction of the sample space it occupies.

If I rolled eight dice, then they’ll never sum to seven and I say that that event occurs with probability 0. If I secretly rolled an unknown number of dice, you could reasonably ask me the probability that they sum to seven. If I answer “0”, that just means that I rolled more than one and fewer than eight dice. It doesn’t make the process less random nor the question less reasonable.

If you treat an event as some question you can ask about the result of a random process, then 1 and 0 make a lot more sense as probabilities.

For the mathematical theory of probability, there are plenty of technical reasons why you want to retain 1 and 0 as probabilities (and once you get into continuous distributions, it turns out that probability 1 just means “almost certain”).

This is what I meant by something being a proven truth- within the rules set one can find outcomes which are axiomatically impossible or necessary. The process itself may be random, but calling it random when something impossible didn’t happen seems odd to me. The very idea that 1 may be not-quite-certain is more than a little baffling, and I suspect is the heart of the issue.

If 1 isn’t quite certain then neither is 0 (if something happens with probability 1, then the probability of it not happening is 0). It’s one of those things that pops up when dealing with infinity.

It’s best illustrated with an example. Let’s say we play a game where we flip a coin and I pay you $1 if it’s heads and you pay me $1 if it’s tails. With probability 1, one of us will eventually go broke (see Gambler’s ruin). It’s easy think of a sequence of coin flips where this never happens; for example, if heads and tails alternated. The theory holds that such a sequence occurs with probability 0. Yet this does not make it impossible.

It can be thought of as the result of a limiting process. If I looked at sequences of N of coin flips, counted the ones where no one went broke and divided this by the total number of possible sequences, then as I let N go to infinity this ratio would go to zero. This event occupies an region with area 0 in the sample space.

If the limit converges then it can hit 0 or 1. Got it. Thank you.

Eliezer isn’t arguing with the mathematics of probability theory. He is saying that in the subjective sense, people don’t actually have absolute certainty. This would mean that mathematical probability theory is an imperfect formalization of people’s subjective degrees of belief. It would not necessarily mean that it is impossible in principle to come up with a better formalization.

Errr… as I read EY’s post, he is certainly talking about the mathematics of probability (or about the formal framework in which we operate on probabilities) and not about some “subjective sense”.

The claim of “people don’t actually have absolute certainty” looks iffy to me, anyway. The immediate two questions that come to mind are (1) How do you know? and (2) Not even a single human being?

Of course if no one has absolute certainty, this very fact would be one of the things we don’t have absolute certainty about. This is entirely consistent.

If we’re asking what the author “really meant” rather than just what would be correct, it’s on record.

I… can’t really recommend reading the entire thread at the link, it’s kind of flame-war-y and not very illuminating.

I think the issue at hand is that 0 and 1 aren’t special cases at all, but very important for the math of probability theory to work (try and construct a probability measure where some subset doesn’t have probability 1 or 0).

This is incredibly necessary for the mathematical idea of probability ,and EY seems to be confusing “are 0 and 1 probabilities relevant to Bayesian agents?” with “are 0 and 1 probabilities?” (yes, they are, unavoidably, not as a special case!).

It seems that EY position boils down to

And that’s a weak claim. EY’s ideas of what is “mentally healthier” are, basically, his personal preferences. I, for example, don’t find any mental health benefits in thinking about one over googolplex probabilities.

Cromwell’s Rule is not EY’s invention, and relatively uncontroversial for empirical propositions (as opposed to tautologies or the like).

If you don’t accept treating probabilities as beliefs and vice versa, then this whole conversation is just a really long and unnecessarily circuitous way to say “remember that you can be wrong about stuff”.

The part that is new compared to Cromwell’s rule is that Yudkowsky doesn’t want to give probability 1 to logical statements (53 is a prime number).

Because he doesn’t want to treat 1 as a probability, you can’t expect complete sets of events to have total probability 1, despite them being tautologies. Because he doesn’t want probability 0, how do you handle the empty set? How do you assign probabilities to statements like “A and B” where A and B are logical exclusive? (the coin lands heads AND the coin lands tails).

Removing 0 and 1 from the math of probability breaks most of the standard manipulations. Again, it’s best to just say “be careful with 0 and 1 when working with odds ratios.”

Nobody is saying EY invented Cromwell’s Rule, that’s not the issue.

The issue is that “0 and 1 are not useful subjective certainties for a Bayesian agent” is a very different statement than “0 and 1 are not probabilities at all”.

You’re right, I misread your sentence about “his personal preferences” as referring to the whole claim, rather than specifically the part about what’s “mentally healthy”. I don’t think we disagree on the object level here.

The way I view that statement is: “In our formalization, agents with absolutely certain beliefs cannot change those beliefs, we want our formalization to capture our intuitive sense of how an ideal agent would update its beliefs, a formalization with a quality of fanaticism does not capture our intuitive sense of how an ideal agent would update its beliefs, therefore we do not want a quality of fanaticism.”

And what state of the world would correspond to the statement “Some people have absolute certainty.” ? Do you think that we can take some highly advanced and entirely fictional neuroimaging technology, look at a brain and meaningfully say, “There’s a belief with probability 1.” ?

And on the other hand, I’m not afraid to talk about folk certainty, where the properties of an ideal mathematical system are less relevant, where everyone can remain blissfully logically uncertain to the fact that beliefs with probability 1 and 0 imply undesirable consequences in formal systems that possess them, and say things like “I believe that absolutely.” I am not afraid to say something like, “That person will not stop believing that for as long as he lives,” and mean that I predict with high confidence that that person will not stop believing that for as long as he lives.

And once you believe that the formalization is trying to capture our intuitive sense of an ideal agent, and decide whether or not that quality of fanaticism captures it, and decide whether or not you’re going to be a stickler about folk language, then I don’t think that any question or confusion around that claim remains.

People are not “ideal agents”. If you specifically construct your formalization to fit your ideas of what an ideal agent should and should not be able to do, this formalization will be a poor fit to actual, live human beings.

So either you make a system for ideal agents—in which case you’ll

stillrun into some problems because, as has been pointed out upthread, standard probability math stops working if you disallow zeros and ones—or you make a system which is applicable to our imperfect world with imperfect humans.I don’t see why both aren’t useful. If you want a descriptive model instead of a normative one, try prospect theory.

I just don’t see this article as an axiom that says probabilities of 0 and 1 aren’t allowed in probability theory. I see it as a warning not to put 0s and 1s in your AI’s prior. You’re not changing the math so much as picking good priors.

I think he’s just acknowledging the minute(?) possibility that our apparently flawless reasoning could have a blind spot. We could be in a Matrix, or have something tampering with our minds, etcetera, such that the implied assertion:

If this appears absolutely certain to me

Then it must be true

is indefensible.

There are two different things.

David_Bolin said (emphasis mine): “He is saying that

in the subjective sense, people don’t actually have absolute certainty.” I am interpreting this as “people never subjectively feel they have absolute certainty about something” which I don’t think is true.You are saying that

from an external (“objective”) point of view, people can not (or should not) be absolutely sure that their beliefs/conclusions/maps are true. This I easily agree with.It should probably be defined by calibration: do some people have a type of belief where they are always right?

Self-referential and anthropic things would probably qualify, e.g. “I believe I exist”.

You can phrase statements of logical deduction such that they have no premises and only conclusions. If we let S be the set of logical principles under which our logical system operates and T be some sentence that entails Y, then S AND T implies Y is something that I have absolute certainty in, even if this world is an illusion, because the premise of the implication contains all the rules necessary to derive the result.

A less formal example of this would be the sentence: If the rules of logic as I know them hold and the axioms of mathematics are true, then it is the case that 2+2=4

A real mathematician got in a debate with EY over this post, and made some really good points: https://np.reddit.com/r/badmathematics/comments/2bazyc/0_and_1_are_not_probabilities_any_more_than/cj43y8k

Maybe this doesn’t stand up mathematically, but I really like the intuition of log odds instead of probability. And this post explained it quite well. And the main point that you shouldn’t believe in absolute certainties is still true. An ideal AI using probability theory would probably use log odds, and not have a 0 or 1.

/r/badmathematics is shuttered now, apparently.

Oh no, really? Who would have thought that the sorts of people who have learned to enjoy indulging contempt would eventually turn on each other.

I really wanted to see that argument though, tell me, to what extent

wasit an argument? Cause I feel like if a person in our school wanted to settle this, they’d just distinguish the practical cases EY’s talking about from the mathematical cases the conversants are talking about and everyone would immediately wake up and realise how immaterial the disagreement always was (though some of them might decide to be mad about that instead), but also, maybe Eleizer kind of likes getting people riled up about this so maybe dispersing the confusion never crossed his mind. Contempt vampires meet contempt bender. Kismesis is forged.I shouldn’t contribute to this “fight”, but I can’t resist. I’d have recommended he bring up how the brunt of the causal network formalization explicitly disallows certain or impossible events on the math level once you cross into a certain level of sophistication (I forget where the threshold was, but I remember thinking “well the bayesian networks that supports 0s and 1s sounds pretty darn limited and I’m going to give up on them just as my elders advised.”)

Ultimately, the “can’t be 0 or 1” restriction is pretty obviously needed for a lot of the formulas to work robustly (you can’t even use the definition of conditional probability without restricting the prior of the evidence! Cause there’s a division in it! There are lots of divisions in probability theory!)

So I propose that we give a name to that restriction, and I offer the name “credences”. (Currently, it seems the word “credence” is just assigned to a bad overload of “probability” that uses percent notation instead of normal range. I doubt anyone will miss it.)

A probability is a credence iff it is neither 0 nor 1. A practical real-world right and justly radically skeptical bayesian reasoner should probably restrict a large, well-delineated subset of its evidence weights to being credences.

And now we can talk about credences and there’s no need for any more confusion, if we want.

It’s back btw. If it ever goes down again you can probably get it on wayback machine. And yes the /r/bad* subreddits are full of terrible academia snobbery. Badmathematics is the best of the bunch because mathematics is at least kind of objective. So they mostly talk about philosophy of mathematics.

The problem is formal models of probability theory have problems with logical uncertainty. You can’t assign a nonzero probability to a false logical statement. All the reasoning about probability theory is around modelling uncertainty in the unkown external world. This is an early attempt to think about logical uncertainty. Which MIRI has now published papers on and tried to formalize.

Just calling them “log odds” is fine and they are widely used in real work.

Btw what does “Response to previous version” mean? Was this article significantly editted? It doesn’t seem so confrontational reading it now.

We published new versions of a lot of sequences posts a few months ago. If you click on the “Response to previous version” text, you can read the original text that the comment was referring to.

Wait, these old posts have been edited? I don’t see the “Response to previous version” link. I’d like to read the originals, as they were written, in chronological order… there are other ways to consume the compendium if I so desired.

Yeah, they were edited as part of the process of compiling Rationality: AI to Zombies. Usually that just involved adding some sources, cleaning up some sentences and fixing some typos.

The “Response to previous version” link is at the top of every comment that was posted on the previous version of the post. See here:

https://res.cloudinary.com/lesswrong-2-0/image/upload/v1577074008/Screen_Shot_2019-12-22_at_7.48.08_PM_n9bcp3.png

I see it now. Is there some way to make the original article the default View? Or a link to the prior version at the top of the article?

You can click on the date-stamp at the top of the post and select the earliest version from there.

Hmm. Reading.

Okay. Summary: All of Eliezer’s writing on this assumed the context of AGI/applied epistemology. That wasn’t obvious from the materials, and it did not occur to this group of pure mathematicians to assume that same focus, because they’re pure mathematicians and because of the activity they had decided to engage in on that day.

I’m years late to this party, and probably missing something obvious. But I’m confused by Yudkowsky’s math here. Wouldn’t it be more correct to say that the prior odds of rolling a

or

`1`

are`1:5`

, which corresponds to a probability of^{1}⁄_{6}

? If odds of0.1666...`1:5`

correspond to a probability of`1/5`

=`0.20`

, that makes me think there are 5 sides to this six-sided die, each side having equal probability.Put differently: when I think of how to convert odds back into a probability number, the formula my brain settles on is not

`P = o / (1 + o)`

as stated above, but rather`P = L / (L + R)`

, if the odds are expressed as`L:R`

. Am I missing something important about common probability practice / jargon here?The real number 0.20 isn’t a probability, it’s just the same odds but written in a different way to make it possible to multiply (specifically you want some odds product

`*`

such that`A:B * C:D = AC:BD`

). You are right about how you would convert the odds into a probability at the end.