# Conservation of Expected Evidence

Friedrich Spee von Langenfeld, a priest who heard the confessions of condemned witches, wrote in 1631 the *Cautio Criminalis* (“prudence in criminal cases”), in which he bitingly described the decision tree for condemning accused witches: If the witch had led an evil and improper life, she was guilty; if she had led a good and proper life, this too was a proof, for witches dissemble and try to appear especially virtuous. After the woman was put in prison: if she was afraid, this proved her guilt; if she was not afraid, this proved her guilt, for witches characteristically pretend innocence and wear a bold front. Or on hearing of a denunciation of witchcraft against her, she might seek flight or remain; if she ran, that proved her guilt; if she remained, the devil had detained her so she could not get away.

Spee acted as confessor to many witches; he was thus in a position to observe *every* branch of the accusation tree, that no matter *what* the accused witch said or did, it was held as proof against her. In any individual case, you would only hear one branch of the dilemma. It is for this reason that scientists write down their experimental predictions in advance.

But *you can’t have it both ways* —as a matter of probability theory, not mere fairness. The rule that “absence of evidence *is* evidence of absence” is a special case of a more general law, which I would name Conservation of Expected Evidence: the *expectation* of the posterior probability, after viewing the evidence, must equal the prior probability.

*Therefore,* for every expectation of evidence, there is an equal and opposite expectation of counterevidence.

If you expect a strong probability of seeing weak evidence in one direction, it must be balanced by a weak expectation of seeing strong evidence in the other direction. If you’re very confident in your theory, and therefore anticipate seeing an outcome that matches your hypothesis, this can only provide a very small increment to your belief (it is already close to 1); but the unexpected failure of your prediction would (and must) deal your confidence a huge blow. On *average*, you must expect to be *exactly* as confident as when you started out. Equivalently, the mere *expectation* of encountering evidence—before you’ve actually seen it—should not shift your prior beliefs.

So if you claim that “no sabotage” is evidence *for* the existence of a Japanese-American Fifth Column, you must conversely hold that seeing sabotage would argue *against* a Fifth Column. If you claim that “a good and proper life” is evidence that a woman is a witch, then an evil and improper life must be evidence that she is not a witch. If you argue that God, to test humanity’s faith, refuses to reveal His existence, then the miracles described in the Bible must argue against the existence of God.

Doesn’t quite sound right, does it? Pay attention to that feeling of *this seems a little forced*, that quiet strain in the back of your mind. It’s important.

For a true Bayesian, it is impossible to seek evidence that *confirms* a theory. There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on *average*) than before. You can only ever seek evidence to *test* a theory, not to confirm it.

This realization can take quite a load off your mind. You need not worry about how to interpret every possible experimental result to confirm your theory. You needn’t bother planning how to make *any* given iota of evidence confirm your theory, because you know that for every expectation of evidence, there is an equal and oppositive expectation of counterevidence. If you try to weaken the counterevidence of a possible “abnormal” observation, you can only do it by weakening the support of a “normal” observation, to a precisely equal and opposite degree. It is a zero-sum game. No matter how you connive, no matter how you argue, no matter how you strategize, you can’t possibly expect the resulting game plan to shift your beliefs (on average) in a particular direction.

You might as well sit back and relax while you wait for the evidence to come in.

. . . Human psychology is *so* screwed up.

- Book summary: Unlocking the Emotional Brain by 8 Oct 2019 19:11 UTC; 251 points) (
- Eliezer’s Sequences and Mainstream Academia by 15 Sep 2012 0:32 UTC; 198 points) (
- Steelmanning Divination by 5 Jun 2019 22:53 UTC; 177 points) (
- Mistakes with Conservation of Expected Evidence by 8 Jun 2019 23:07 UTC; 173 points) (
- Making your explicit reasoning trustworthy by 29 Oct 2010 0:00 UTC; 119 points) (
- Compartmentalization in epistemic and instrumental rationality by 17 Sep 2010 7:02 UTC; 115 points) (
- An Equilibrium of No Free Energy by 31 Oct 2017 21:27 UTC; 113 points) (
- What Bayesianism taught me by 12 Aug 2013 6:59 UTC; 111 points) (
- Serious Stories by 8 Jan 2009 23:49 UTC; 97 points) (
- Bayesianism for Humans by 29 Oct 2013 23:54 UTC; 89 points) (
- Mark Eichenlaub: How to develop scientific intuition by 23 Oct 2018 13:30 UTC; 74 points) (
- Bayesian Mindset by 21 Dec 2021 19:54 UTC; 70 points) (EA Forum;
- Hindsight bias by 16 Aug 2007 21:58 UTC; 68 points) (
- Unnatural Categories Are Optimized for Deception by 8 Jan 2021 20:54 UTC; 67 points) (
- The Principle of Predicted Improvement by 23 Apr 2019 21:21 UTC; 66 points) (
- Fake Optimization Criteria by 10 Nov 2007 0:10 UTC; 56 points) (
- Why you must maximize expected utility by 13 Dec 2012 1:11 UTC; 48 points) (
- 8 Jul 2014 8:29 UTC; 43 points) 's comment on Consider giving an explanation for your deletion this time around. “Harry Yudkowsky and the Methods of Postrationality: Chapter One: Em Dashes Colons and Ellipses, Littérateurs Go Wild” by (
- Knightian uncertainty: a rejection of the MMEU rule by 26 Aug 2014 3:03 UTC; 40 points) (
- Knightian uncertainty in a Bayesian framework by 24 Jul 2014 14:31 UTC; 39 points) (
- Co-Proofs by 21 May 2018 21:10 UTC; 39 points) (
- 8 May 2014 17:07 UTC; 38 points) 's comment on A Dialogue On Doublethink by (
- Awful Austrians by 12 Apr 2009 6:06 UTC; 37 points) (
- A Suggested Reading Order for Less Wrong [2011] by 8 Jul 2011 1:40 UTC; 37 points) (
- Knightian Uncertainty and Ambiguity Aversion: Motivation by 21 Jul 2014 20:32 UTC; 37 points) (
- Applied Bayes’ Theorem: Reading People by 30 Jun 2010 17:21 UTC; 36 points) (
- Book Review—Other Minds: The Octopus, the Sea, and the Deep Origins of Consciousness by 3 Dec 2018 8:00 UTC; 34 points) (
- Wolf’s Dice II: What Asymmetry? by 17 Jul 2019 15:22 UTC; 33 points) (
- Rationalists should beware rationalism by 6 Apr 2009 14:16 UTC; 32 points) (
- SotW: Avoid Motivated Cognition by 28 May 2012 15:57 UTC; 31 points) (
- Rational vs. Scientific Ev-Psych by 4 Jan 2008 7:01 UTC; 30 points) (
- 7 Sep 2018 3:28 UTC; 29 points) 's comment on nostalgebraist—bayes: a kinda-sorta masterpost by (
- Proper value learning through indifference by 19 Jun 2014 9:39 UTC; 28 points) (
- How do you notice when you are ignorant of necessary alternative hypotheses? by 24 Jun 2014 18:12 UTC; 27 points) (
- The failure of counter-arguments argument by 10 Jul 2013 13:38 UTC; 26 points) (
- Some of the best rationality essays by 19 Oct 2021 22:57 UTC; 25 points) (
- What is the right phrase for “theoretical evidence”? by 1 Nov 2020 20:43 UTC; 24 points) (
- 26 Jul 2011 8:24 UTC; 23 points) 's comment on Optimal Philanthropy for Human Beings by (
- Swimming in Reasons by 10 Apr 2010 1:24 UTC; 20 points) (
- 27 Aug 2014 16:44 UTC; 20 points) 's comment on Open thread, 25-31 August 2014 by (
- Conservation of expected moral evidence, clarified by 20 Jun 2014 10:28 UTC; 20 points) (
- 12 Mar 2012 4:32 UTC; 20 points) 's comment on Falsification by (
- Drawing Two Aces by 3 Jan 2010 10:33 UTC; 19 points) (
- Bayesian Collaborative Filtering by 3 Apr 2010 23:29 UTC; 19 points) (
- Paradoxes in all anthropic probabilities by 19 Jun 2018 15:31 UTC; 19 points) (
- Testing for Rationalization by 16 Jan 2020 8:12 UTC; 19 points) (
- An Equilibrium of No Free Energy by 31 Oct 2017 22:25 UTC; 18 points) (EA Forum;
- Confounded No Longer: Insights from ‘All of Statistics’ by 3 May 2018 22:56 UTC; 18 points) (
- Value learning: ultra-sophisticated Cake or Death by 17 Jun 2014 16:36 UTC; 18 points) (
- Subtle Forms of Confirmation Bias by 3 Jul 2017 23:00 UTC; 18 points) (
- 17 Apr 2009 15:32 UTC; 16 points) 's comment on My Way by (
- Rationality Reading Group: Part C: Noticing Confusion by 18 Jun 2015 1:01 UTC; 16 points) (
- 16 May 2008 18:02 UTC; 16 points) 's comment on Science Isn’t Strict Enough by (
- That Crisis thing seems pretty useful by 10 Apr 2009 17:10 UTC; 16 points) (
- Conservation of Expected Jury Probability by 22 Aug 2014 15:25 UTC; 15 points) (
- 5 Dec 2010 6:07 UTC; 14 points) 's comment on The Trolley Problem: Dodging moral questions by (
- Solving the Doomsday argument by 17 Jan 2019 12:32 UTC; 14 points) (
- 20 Oct 2007 1:34 UTC; 13 points) 's comment on Pascal’s Mugging: Tiny Probabilities of Vast Utilities by (
- [SEQ RERUN] Update Yourself Incrementally by 20 Jul 2011 4:22 UTC; 13 points) (
- 17 Mar 2009 1:32 UTC; 12 points) 's comment on The “Spot the Fakes” Test by (
- Shock Levels are Point Estimates by 14 Feb 2010 4:31 UTC; 12 points) (
- 3 Oct 2013 7:00 UTC; 11 points) 's comment on Crush Your Uncertainty by (
- [SEQ RERUN] Conservation of Expected Evidence by 18 Jul 2011 2:27 UTC; 11 points) (
- 20 Nov 2010 18:17 UTC; 11 points) 's comment on What I’ve learned from Less Wrong by (
- 27 Feb 2009 23:30 UTC; 10 points) 's comment on The Most Important Thing You Learned by (
- 4 May 2020 17:52 UTC; 10 points) 's comment on Named Distributions as Artifacts by (
- 6 Apr 2013 10:11 UTC; 10 points) 's comment on Welcome to Less Wrong! (5th thread, March 2013) by (
- 28 Sep 2007 2:55 UTC; 8 points) 's comment on How to Convince Me That 2 + 2 = 3 by (
- 2 May 2011 18:37 UTC; 8 points) 's comment on Ethics and rationality of suicide by (
- 1 Nov 2013 14:25 UTC; 8 points) 's comment on Bayesianism for Humans by (
- 20 Jul 2012 22:10 UTC; 8 points) 's comment on In Defense of Tone Arguments by (
- 27 Jun 2011 12:42 UTC; 8 points) 's comment on Discussion: Yudkowsky’s actual accomplishments besides divulgation by (
- 17 Jan 2011 3:06 UTC; 8 points) 's comment on Welcome to Less Wrong! (2010-2011) by (
- 2 Mar 2011 21:17 UTC; 8 points) 's comment on Rationality Quotes: March 2011 by (
- 3 Nov 2012 19:10 UTC; 7 points) 's comment on Logical Pinpointing by (
- Understanding Eliezer’s “Any Fact Would Move Me in the Same Direction” by 3 Mar 2022 4:30 UTC; 7 points) (
- 21 Jan 2014 18:23 UTC; 7 points) 's comment on Using vs. evaluating (or, Why I don’t come around here no more) by (
- An extended class of utility functions by 17 Jun 2014 16:36 UTC; 7 points) (
- 11 Aug 2013 19:16 UTC; 7 points) 's comment on What Bayesianism taught me by (
- 3 Mar 2022 22:51 UTC; 7 points) 's comment on Recognizing and Dealing with Negative Automatic Thoughts by (
- 3 May 2009 23:23 UTC; 6 points) 's comment on Return of the Survey by (
- 8 Jul 2012 19:12 UTC; 6 points) 's comment on Rationality Quotes July 2012 by (
- 4 Jul 2011 18:18 UTC; 6 points) 's comment on Personal Examples of using Bayes’ Theorem by (
- 1 May 2010 14:52 UTC; 6 points) 's comment on Open Thread: May 2010 by (
- 13 Sep 2013 0:07 UTC; 6 points) 's comment on Rationality Quotes September 2013 by (
- 25 Jun 2009 2:04 UTC; 6 points) 's comment on Guilt by Association by (
- 7 Aug 2011 15:13 UTC; 5 points) 's comment on Teaching Suggestions? by (
- 8 Aug 2011 20:21 UTC; 5 points) 's comment on The elephant in the room, AMA by (
- 1 Dec 2012 16:27 UTC; 5 points) 's comment on Open Thread, December 1-15, 2012 by (
- 4 Nov 2010 3:23 UTC; 5 points) 's comment on Maybe Theism Is OK by (
- 31 Aug 2021 17:14 UTC; 5 points) 's comment on Can you control the past? by (
- 18 Feb 2010 13:03 UTC; 5 points) 's comment on Open Thread: February 2010, part 2 by (
- 27 Feb 2009 21:15 UTC; 5 points) 's comment on The Most Important Thing You Learned by (
- 15 Dec 2017 13:35 UTC; 5 points) 's comment on Youtube channel devoted to the art of rationality by (
- 2 Mar 2010 21:20 UTC; 4 points) 's comment on Open Thread: March 2010 by (
- 11 Nov 2017 7:29 UTC; 4 points) 's comment on The Copernican Revolution from the Inside by (
- 14 Apr 2013 1:41 UTC; 4 points) 's comment on Estimate Stability by (
- 17 Mar 2012 23:57 UTC; 4 points) 's comment on Risks from AI and Charitable Giving by (
- 29 Apr 2019 22:39 UTC; 4 points) 's comment on The Principle of Predicted Improvement by (
- 27 Sep 2016 21:38 UTC; 4 points) 's comment on Value of Information: Four Examples by (
- 30 Jun 2011 23:41 UTC; 3 points) 's comment on I’m becoming intolerant. Help. by (
- 9 Sep 2014 17:36 UTC; 3 points) 's comment on Rationality Quotes September 2014 by (
- 22 Oct 2017 23:06 UTC; 3 points) 's comment on Seek Fair Expectations of Others’ Models by (
- 30 Oct 2013 11:22 UTC; 3 points) 's comment on Bayesianism for Humans by (
- 9 Aug 2013 5:12 UTC; 3 points) 's comment on Open thread, July 29-August 4, 2013 by (
- 4 May 2020 21:18 UTC; 3 points) 's comment on Named Distributions as Artifacts by (
- 4 Jan 2008 15:37 UTC; 3 points) 's comment on Rational vs. Scientific Ev-Psych by (
- 8 Apr 2020 23:55 UTC; 3 points) 's comment on Choosing the Zero Point by (
- 12 Dec 2010 18:28 UTC; 3 points) 's comment on A Thought on Pascal’s Mugging by (
- 5 May 2015 14:35 UTC; 3 points) 's comment on Open Thread, May 4 - May 10, 2015 by (
- 15 Jun 2012 23:50 UTC; 2 points) 's comment on Advices needed for a presentation on rationality by (
- 26 Aug 2010 17:33 UTC; 2 points) 's comment on Open Thread, August 2010 by (
- Many Reasons by 25 Jul 2009 5:09 UTC; 2 points) (
- 25 Oct 2011 16:29 UTC; 2 points) 's comment on Practicing what you preach by (
- 7 Sep 2020 20:56 UTC; 2 points) 's comment on Beautiful Probability by (
- 28 Jun 2009 15:48 UTC; 2 points) 's comment on Controlling your inner control circuits by (
- 26 Feb 2009 6:01 UTC; 2 points) 's comment on Markets are Anti-Inductive by (
- 28 Nov 2012 20:22 UTC; 2 points) 's comment on Causal Universes by (
- 24 Oct 2010 4:07 UTC; 2 points) 's comment on Harry Potter and the Methods of Rationality discussion thread, part 4 by (
- 11 Jul 2013 2:00 UTC; 2 points) 's comment on Harry Potter and the Methods of Rationality discussion thread, part 22, chapter 93 by (
- 19 Aug 2013 19:29 UTC; 2 points) 's comment on Where I’ve Changed My Mind on My Approach to Speculative Causes by (
- 2 Dec 2013 23:16 UTC; 1 point) 's comment on Reasons to believe by (
- 7 Oct 2014 1:20 UTC; 1 point) 's comment on Using Bayes to dismiss fringe phenomena by (
- 28 Nov 2013 11:10 UTC; 1 point) 's comment on On Walmart, And Who Bears Responsibility For the Poor by (
- 5 Apr 2011 3:54 UTC; 1 point) 's comment on Bayesianism versus Critical Rationalism by (
- 19 Oct 2011 22:44 UTC; 1 point) 's comment on The Sciencearchist Manifesto by (
- 3 Sep 2011 1:11 UTC; 1 point) 's comment on Another treatment of Direct Instruction getting more into the technical details of the theory by (
- 1 Dec 2012 18:50 UTC; 1 point) 's comment on Open Thread, December 1-15, 2012 by (
- 13 Apr 2009 1:44 UTC; 1 point) 's comment on Awful Austrians by (
- Meetup : West LA Meetup 11-30-2011 by 29 Nov 2011 5:22 UTC; 1 point) (
- 7 Mar 2012 20:32 UTC; 1 point) 's comment on Causal diagrams and software engineering by (
- 16 Oct 2014 20:25 UTC; 1 point) 's comment on Questions on Theism by (
- 14 May 2011 6:41 UTC; 1 point) 's comment on Reference Classes in the Doomsday Argument by (
- 26 Dec 2012 3:16 UTC; 1 point) 's comment on What Evidence Filtered Evidence? by (
- 7 Jul 2012 19:07 UTC; 1 point) 's comment on Stupid Questions Open Thread Round 2 by (
- 29 Oct 2019 2:27 UTC; 1 point) 's comment on I would like to try double crux. by (
- 2 Mar 2014 22:55 UTC; 1 point) 's comment on Self-Congratulatory Rationalism by (
- 13 Mar 2013 18:13 UTC; 0 points) 's comment on Don’t Get Offended by (
- Why we want unbiased learning processes by 20 Feb 2018 15:10 UTC; 0 points) (
- 21 Jul 2016 5:37 UTC; 0 points) 's comment on Zombies Redacted by (
- 16 Sep 2012 1:15 UTC; 0 points) 's comment on Knowing About Biases Can Hurt People by (
- 7 Nov 2012 6:07 UTC; 0 points) 's comment on Open Thread, November 1-15, 2012 by (
- The Reality of Emergence by 4 Oct 2017 8:11 UTC; 0 points) (
- 25 Oct 2016 21:49 UTC; 0 points) 's comment on The Robbers Cave Experiment by (
- 24 Mar 2010 4:42 UTC; 0 points) 's comment on Einstein’s Arrogance by (
- 22 Nov 2011 7:46 UTC; 0 points) 's comment on Yet another Sleeping Beauty by (
- 22 Sep 2013 22:10 UTC; 0 points) 's comment on Pascal’s Mugging: Tiny Probabilities of Vast Utilities by (
- 6 Feb 2012 0:46 UTC; 0 points) 's comment on Making Beliefs Pay Rent (in Anticipated Experiences) by (
- 25 Jan 2011 17:51 UTC; 0 points) 's comment on Gettier in Zombie World by (
- 5 May 2009 6:36 UTC; 0 points) 's comment on Bead Jar Guesses by (
- 24 May 2013 18:48 UTC; 0 points) 's comment on Robustness of Cost-Effectiveness Estimates and Philanthropy by (
- 3 Aug 2014 4:53 UTC; -1 points) 's comment on Absolute Authority by (
- 7 Mar 2012 22:33 UTC; -1 points) 's comment on Causal diagrams and software engineering by (
- 27 Mar 2013 20:05 UTC; -1 points) 's comment on Causal Diagrams and Causal Models by (
- 13 Sep 2013 14:27 UTC; -2 points) 's comment on Rationality Quotes September 2013 by (

One minor correction, Eliezer: the link to your essay uses the text “An Intuitive Expectation of Bayesian Reasoning.” I think you titled that essay “An Intuitive EXPLANATION of Bayesian Reasoning.” (I am 99.9999% sure of this, and would therefore pay especial attention to any evidence inconsistent with this proposition.)

I guess I was a Bayesian before I knew what it meant....

Perhaps this formulation is nice:

0 = (P(H|E)-P(H))

P(E) + (P(H|~E)-P(H))P(~E)The expected change in probability is zero (for if you expected change you would have already changed).

Since P(E) and P(~E) are both positive, to maintain balance if P(H|E)-P(H) < 0 then P(H|~E)-P(H) > 0. If P(E) is large then P(~E) is small, so (P(H|~E)-P(H)) must be large to counteract (P(H|E)-P(H)) and maintain balance.

Hey, sorry if it’s mad trivial, but may I ask for a derivation of this? You can start with “P(H) = P(H|E)P(E) + P(H|~E)P(~E)” if that makes it shorter.

(edit):

Never mind, I just did it. I’ll post it for you in case anyone else wonders.

1} P(H) = P(H|E)P(E) + P(H|~E)P(~E) [CEE]

2} P(H)P(E) + P(H)P(~E) = P(H|E)P(E) + P(H|~E)P(~E) [because ab + (1-a)b = b]

3} (P(H) - P(H))P(E) + (P(H) - P(H))P(~E) = (P(H|E) - P(H))P(E) + (P(H|~E) - P(H))P(~E) [subtract P(H) from every value to be weighted]

4} (P(H) - P(H))P(E) + (P(H) - P(H))P(~E) = P(H) - P(H) = 0 [because ab + (1-a)b = b]

(conclusion)

5} 0 = (P(H|E) - P(H))P(E) + (P(H|~E) - P(H))P(~E) [by identity syllogism from lines 3 and 4]

P(H) = P(H|E)P(E) + P(H|~E)P(~E)

P(H)*(P(E)+P(~E))=P(H|E)P(E) + P(H|~E)P(~E)

P(H)P(E)+P(H)P(~E)=P(H|E)P(E) + P(H|~E)P(~E)

P(H)P(~E)=(P(H|E)-P(H))*P(E) + P(H|~E)P(~E)

0=(P(H|E)-P(H))*P(E) + (P(H|~E)-P(H))*P(~E)

The trick is that P(E)+P(~E)=1, and so you can multiply the left side by the sum and the right side by 1.

Eliezer,

Of course you are assuming a strong form of Bayesianism here. Why do we have to accept that strong form?

More precisely, I see no reason why there need be no change in the confidence level. As long as the probability is greater than 50% in one direction or the other, I have an expectation of a certain outcome. So, if some evidence slightly moves the expectation in a particular direction, but does not push it across the 50% line from wherever it started, what is the big whoop?

One reason is Cox’s theorem, which shows any quantitative measure of plausibility must obey the axioms of probability theory. Then this result, conservation of expected evidence, is a theorem.

What is the “confidence level”? Why is 50% special here?

“Of course you are assuming a strong form of Bayesianism here. Why do we have to accept that strong form?”

Because it’s mathematically proven. You might as well ask “Why do we have to accept the strong form of arithmetic?”

“So, if some evidence slightly moves the expectation in a particular direction, but does not push it across the 50% line from wherever it started, what is the big whoop?”

Because (in this case especially!) small probabilities can have large consequences. If we invent a marvelous new cure for acne, with a 1% chance of death to the patient, it’s well below 50% and no specific person using the “medication” would

expectto die, but no sane doctor would ever sanction such a “medication”.“Why is 50% special here?”

People seem to have a little arrow in their heads saying whether they “believe in” or “don’t believe in” a proposition. If there are two possibilities, 50% is the point at which the little arrow goes from “not believe” to “believe”.

It’s based on premises that may or may not be accurate. Just because it’s mathematically proven, doesn’t mean it’s true.

And if I am following you, this is irrational. Correct?

More importantly, it’s physically proven. The fact that the math is consistent (and elegant!) would not have been so powerful if it wasn’t also true, particularly since Bayesianism implies some very surprising predictions.

Fortunately, it is the happy case that, to the best of my knowledge, no experiments thus far contradict Bayesianism, and not for the lack of trying, which is as much proof as physically possible.

Foundational issues like Bayesianism run into the old philosophy of science problems with a vengeance: which part of the total assortment of theory and observation do you choose to throw out? If someone proves a paradox in Bayesianism, do you shrug and start looking at alternatives—or do you ‘defy the evidence’ and patiently wait for an E.T. Jaynes to come along and explain how the paradox stems from taking an imprior limit or failing to take into account prior information etc.?

(I’ll adopt the seemingly rationalist trait of never taking questions as rhetorical, though both your questions strongly have that flavor).

A central part of the modern scientific method is due to Popper, who gave an essentially Bayesian answer to your first question. However, Science wouldn’t fall apart if it turned out that priors aren’t a physical reality. Occam’s razor is non-Bayesian, and it alone accounts for a large portion of our scientific intuitions. At the bottom line, the scientific method doesn’t have to be itself true in order to be effective in discovering truths and discarding falsehoods.

The concept of “proving a paradox” is unclear to me (almost a paradox in itself...). Paradoxes are mirages. Also, it seems that you have some specific piece of scientific history in mind, but I’m uncertain which.

Luckily, we did have Jaynes and others to promote what I believe to be both a compelling mathematical framework and a physical reality. Before them, well, it would be wishful to think I could hold on to Bayesian ideas in the face of apparent paradoxes. The shoulders of giants etc.

Occam’s Razor is non-Bayesian? Correct me if I’m wrong, but I thought it falls naturally out of Bayesian model comparison, from the normalization factors, or “Occam factors.” As I remember, the argument is something like: given two models with independent parameters {A} and {A,B}, the P(AB model) \propto P(AB are correct) and P(A model) \propto P(A is correct). Then P(AB model) ⇐ P(A model).

Even if the argument is wrong, I think the result ends up being that more plausible models tend to have fewer independent parameters.

You’re not really wrong. The thing is that “Occam’s razor” is a conceptual principle, not one mathematically defined law. A certain (subjectively very appealing) formulation of it does follow from Bayesianism.

Your math is a bit off, but I understand what you mean. If we have two sets of models, with no prior information to discriminate between their members, then the prior gives less probability to each model in the larger set than in the smaller one.

More generally, if deciding that model 1 is true gives you more information than deciding that model 2 is true, that means that the maximum entropy given model 1 is lower than that given model 2, which in turn means (under the maximum entropy principle) that model 1 was a-priori less likely.

Anyway, this is all besides the discussion that inspired my previous comment. My point was that even without Popper and Jaynes to enlighten us, science was making progress using other methods of rationality, among which is a myriad of non-Bayesian interpretations of Occam’s razor.

Crystal clear. Sorry to distract from the point.

How does deciding one model is true give you more information? Did you mean “If a model allows you to make more predictions about future observations, then it is a priori less likely?”

Let’s assume a strong version of Bayesianism, which entails the maximum entropy principle. So our belief is the one that has the maximum entropy, among those consistent with our prior information. If we now add the information that some model is true, this generally invalidate our previous belief, making the new maximum-entropy belief one of lower entropy. The reduction in entropy is the amount of information you gain by learning the model. In a way, this is a cost we pay for “narrowing” our belief.

The upside of it is that it tells us something useful about the future. Of course, not all information regarding the world is relevant for future observations. The part that doesn’t help control our anticipation is failing to pay rent, and should be evacuated. The part that does inform us about the future may be useful enough to be worth the cost we pay in taking in new information.

I’ll expand on all of this in my sequence on reinforcement learning.

At what point does the decision “This is true” diverge from the observation “There is very strong evidence for this”, other than in cases where the model is accepted as true

despitea lack of strong evidence?I’m not discussing the case where a model goes from unknown to known- how does

decidingto believe a model give you more information than knowing what the model is and the reason for the model. To better model an actual agent, one could replace all of the knowledge about why the model is true with the value of the strength of the supporting knowledge.How does deciding that things always fall down give you more information than observing things fall down?

I believe the idea was to ask “hypothetically, if I found out that this hypothesis was true, how much new information would that give me?”

You’ll have two or more hypotheses, and one of them is the one that would (hypothetically) give you the least amount of new information. The one that would give you the least amount of new information should be considered the “simplest” hypothesis. (assuming a certain definition of “simplest”, and a certain definition of “information”)

Aaron, fixed.

Eliezer, when you’re lost in an unfamiliar neighbourhood, do sit back, relax and wait for evidence of your location to come in? Obviously not, since you’re still alive and haven’t yet starved to death. Well guess what, none of

mydirect ancestors starved to death before they reproduced either. That’s a scientific fact, and it just goes to show that when it comes to the thinking game, nothing succeeds like success.And success is not the same as accuracy, except in a mystical world of spherical cows of uniform density. In the real world, the value of a perfectly correct decision which takes an infinate amount of time to evaluate is exactly 0. [Note the lack of units—renormalize if you dare.] I have empirical data showing that this world contains distinctly heterogenous cows. Reprints available upon request.

Tom,

Bayes’ Theorem has its limits. The support must be continuous, the dimensionality must be finite. Some of the discussion here has raised issues here that could be relevant to these kinds of conditiosn, such as fuzziness about the truth or falsity of H. This is not as straightforward as you claim it is.

Furthermore, I remind one and all that Bayes’ Theorem is asymptotic. Even if the conditions hold, the “true” probability is approached only in the infinite time horizon. This could occur so slowly that it might stay on the “wrong” side of 50% well past the time that any finite viewer might hang around to watch.

There is also the black swan problem. It could move in the wrong direction until the black swan datum finally shows up pushing it in the other direction, which, again, may not occur during the time period someone is observing. This black swan question is exactly the frame of discussion here, as it is Taleb who has gone on and on about this business about evidence and absence thereof.

You cannot predict a black swan. That’s why it can screw up your expectation.

However, once you have a black swan you’d be an irrational fool not to include it in your expectation.

That’s the point. That’s why theories get updated—new data that nobody was aware of before does not match expectations. This new evidence adjusts the probability that the theory was correct, and it gets thrown out if a different theory now has a higher probability in light of the new evidence.

This is not a shortcoming of Bayes Theorem, it’s a shortcoming of observation.

Thatyou should certainly be aware of. I.e. “I might not have all the facts.”you can’t possibly expect the resulting game plan to shift your beliefs (on average) in a particular direction.But you can act to change the probability distribution of your future beliefs (just not its mean). That’s the entire point of testing a belief. If you have a 50% belief that a ball is under a certain cup, then by lifting the cup, you can be certain than your future belief will be in the set {0%,100%} (with equal probability for 0 and 100, hence the same mean as now).

Getting the right shape of the probability distribution of future belief is the whole skill in testing a hypothesis.

But you can’t have it both ways—as a matter of probability theory, not mere fairness.You’ve proved your case—but there’s still enough wriggle room that it won’t make much practical difference. One example from global warming, which predicts higher temperature on average in Europe—unless it diverts the gulf stream, in which case it predicts lower average temperatures. Consider the two statements: 1) If average temperatures go up in Europe, or down, this is evidence for global warming. 2) If average temperatures go up in Europe, and the gulf stream isn’t diverted, or average temperatures go down, while the gulf stream is diverted, this is evidence of global warming.

1) is nonsense, 2) is true. Lots of people say statements that sound like 1), when they mean something like 2). Add an extra detail, and the symmetry is broken.

This weakens the practical power of your point; if an accused witch is afraid, that shows she’s guilty; if she’s not afraid, in a way which causes the inquisitor to be suspicious, she’s also guilty. That argument is flawed, but it isn’t a logical flaw (since the similar statement 2) is true).

Then we’re back to arguing the legitimacy of these “extra details”.

Stuart, if the extra details are observable and

specified in advance, the legitimacy is clear-cut.Barkley, I’m an infinite set atheist, all real-world problems are finite; and you seem to be assuming that priors are arbitrary but likelihood ratios are fixed eternal and known, which is a strange position; and in any case what does that have to do with something as simple as Conservation of Expected Evidence? If anyone attempts to make an infinite-set scenario that violates CEE, it disproves their setup by reductio ad absurdum, and reinforces the ancient wisdom of E. T. Jaynes that no infinity may be assumed except as the proven limit of a finite problem.

Eliezer,

I do not necessarily believe that likelihood ratios are fixed for all time. The part of me that is Bayesian tends to the radically subjective form a la Keynes.

Also, I am a fan of nonstandard analysis. So, I have no problem with infinities that are not mere limits.

Eliezer,

I just googled “law of conservation of expected evidence.” This blog came up. Nothing else like it. Frankly, I don’t think you are selling a law here. You are asserting one that nobody else is aware of.

a more general law, which I would name Conservation of Expected EvidenceI thought it was pretty clear that I was coining the phrase. I’m certainly not the first person to point out the law. E.g. Robin notes that our best estimate of anything should have no predictable trend. In any case, I posted the mathematical derivation and you certainly don’t have to take my word about anything.

Eliezer,

Fair enough. You get credit, then, for coining the term. However, the problem remains, why should that equals sign be there? Sure, if you put it there, the logic holds up, my niggles about Bayes’ Theorem and time to convergence and all that aside. But, it is not clear at all that the equals sign should be there, or is there in any meaningfully regular way. Your defense has been to cite an essentially empirical argument by Robin. But that empirical argument is much contested in many arenas. Sure, Burton Malkiel posed that financial markets are a random walk, but that argument has undergone a lot of modifications since he first posed it in a best-selling paperback. In that regard, your proof essentially amounts to one of these “proofs” of the existence of God, wherein the proof arises from another assumption that gets snuck in the backdoor that gets one the result, but that is itself as questionable or unprovable, much like the old complaint by Joan Robinson about the magician making a big deal about pulling the rabbit out of the hat after having put it into the hat in full view of the audience.

Barkley, it looks to me like Eli derived it using the sum and product rules of probability theory.

What Peter said. Barkley, do you question that P(H) = P(H,E) + P(H, ~E) or do you question that P(H,E) = P(H|E)*P(E)?

Eliezer and Peter, I think the problem is statics versus dynamics. Your set of equations are correct only at a specific point in time, which makes them irrelevant to saying anything about what happens later when new information arrives. That would entail subscripting H by time. For any given t, sure. But, that says nothing about what happens when new information arrives. P(H) might change.

The obvious example is indeed the black swan story, which we all know is what is lying behind this discussion. So, at a point in time before black swans are observed, let H be “all swans are white.” Perhaps there were a few folks who thought this might not be true, so say P(H) was 95%. Sure, your equations hold at a point in time, but so what? The minute the word comes in about the observation of a black swan (assuming it is accepted), P(H) just went to zero, or not much above zero, perhaps after having gradually drifted over time to 95%. Remember, your story was one about new information coming in and changes over time. But that is not what your equations are about.

This is the fatal flaw in your nice new law, Eliezer.

...

Barkley, you don’t realize that Bayes’s Theorem is

preciselywhat describes thenormativeupdate in beliefs over time? That this is thewhole pointof Bayes’s Theorem?Before black swans were observed, no one expected to encounter a black swan, and everyone expected to encounter another white swan on occasion. A black swan is huge evidence against, a white swan is tiny additional evidence for. Had they been normative, the two quantities would have balanced exactly.

I’m not sure what to say here. Maybe point to

Probability Theory: The Logic of Scienceor A Technical Explanation of Technical Explanation? I don’t know where this misunderstanding is coming from, but I’m learning a valuable lesson in how much Bayesian algebra someone can know without realizing which material phenomena it describes.“no one expected to encounter a white swan, and everyone expected to encounter another black swan on occasion. A white swan is huge evidence against, a black swan is tiny additional evidence for.” I presume you meant the reverse of this?

Oops, fixed.

per the Black Swan:

The set of potential multicolored variations of Swans is infinite (purple, brown, grey, blue, green, etc). We can not prove any one of them do not exist. But every day that proceeds where we don’t see these swans gives us a higher probability they do not exist. It never equals 1, but it’s darn close.

The problem with the Black Swan parable is not that it’s untrue, but rather unimportant. The set of things we have no evidence of is infinite. To then pounce across an unexpected observation (eg, a Black Swan, that Kevin Federline is a relatively good parent, last week’s liquidity run on mortgage lenders), and say, “aha! You were all wrong!” merely sets up a staw man, that everything we reasonably don’t anticipate and plan for is assumed to have had a probability of zero.

In reality, when you want to pay money for extreme events you overpay, that is, the implied probability is overweighted because sellers can’t insure against these events. London bookmakers offer only 250-1 odds against a perpetual motion machine being discovered, 100-1 that aliens won’t be proven. In option markets you have a volatility smile so that extreme events get higher and higher implied volatilities as you move away from the mean, meaning their probability is not assumed Gaussian.

The bottom line is that “absence of evidence is not evidence of absence” merely uses hindsight to attack a caricature of beliefs, and seems to suggests something practically important. In practice, people lose money on lottery tickets (or hurricane insurance, or buing a 3-delta put), so exploiting this is a fool’s game.

Eliezer,

This is about to scroll off, but, frankly, I do not know what you mean by “normative” in this context. The usual usage of this term implies statements about values or norms. I do not see that anything about this has anything to do with values or norms. Perhaps I do not understand the “wholel point of Bayes’ Theorem.” Then again, I do not see anything in your reply that actually counters the argument I made.

Bottom line: I think your “law” is only true by assumption.

What I mean, Barkley, is that the expression P(H|E), as held at time t=0,

should—normatively—describe the belief about H you will hold at time t=2 if you see evidence E at time t=1. Thus, statements true in probability theory about the decomposition of P(H) imply thenormativelaw of Conservation of Expected Evidence, if you accept that probability theory is normative for real-world problems where no one has ever seen an infinite set.If you don’t think probability theory is valid in the real world, I have some Dutch Book trades I’d like to make with you. But that’s a separate topic, and in any case, most readers of this blog will at least

understand what I intend to conveywhen I speak from within the view that probability theory is normative.Eliezer Yudkowsky, The word “normative” has stood in the way of my understanding what you mean, at least the first few times I saw you use it, before I pegged you as getting it from the heuristics and biases people. It greatly confused me many times when I first encountered them. It’s jargon, so it shouldn’t be surprising that different fields use it to mean rather different things.

The heuristics and biases people use it to mean “correct,” because social scientists aren’t allowed to use that word. I think there’s a valuable lesson about academics, institutions, or taboos in there, but I’m not sure what it is. As far as I can tell, they are the only people that use it this way.

My dictionary defines normative as “of, relating to, or prescribing a norm or standard.” It’s confusing enough that it carries those two or three meanings, but to make it mean “correct” as well is asking for trouble or in-groups.

I agree—it can be especially ambiguous if you’re also used to the economics context of normative, meaning “how subjectively desirable something is”.

This post was one of the most helpful for me personally, but I recently realized this isn’t true in an absolute sense: “There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before.”

Suppose the statement “I perform action A” is more probable given position P than given not-P. Then if I start planning to perform action A, this will be evidence that I will perform A. Therefore it will also be evidence for position P. So there is some plan that I can devise such that I can expect my confidence in P to be higher than before I devised the plan.

In general, of course, unless P is a position relating to my actions or habits, this effect will not be very large.

Um, no, if a study shows that people who chew gum also have a gene GXTP27 or whatever, which also protects against cancer, I cannot plan to increase my subjective probability that I have gene GXTP27 by starting to chew gum.

See also: “evidential decision theory”, why nearly all decision theorists do not believe in.

Here’s an example which doesn’t bear on Conservation of Expected Evidence as math, but does bear on the statement,

“There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before.”

taken at face value.

It’s called the Cable Guy Paradox; it was created by Alan Hájek, a philosopher the Australian National University. (I personally think the term Paradox is a little strong for this scenario.)

Here it is: the cable guy is coming tomorrow, but cannot say exactly when. He may arrive any time between 8 am and 4 pm. You and a friend agree that the probability density for his arrival should be uniform over that interval. Your friend challenges you to a bet: even money for the event that the cable guy arrives before noon. You get to pick which side of the bet you want to take—by expected utility, you should be indifferent. Here’s the curious thing: if you pick the morning bet, then almost surely there will be times in the morning when you would prefer to switch to the afternoon bet.

This would seem to be a situation in which “you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before,” even though the equation P(H) = P(H|E)

P(E) + P(H|~E)P(~E) is not violated. I’m not sure, but I think it’s due to multiple possible interpretations of the word “before”.You either have a new interval, or new information suggesting the probability density for the interval has changed.

Conservation of Expected Evidence does not mean Ignorance of Observed Evidence.

This is just a restatement of the black swan problem, and it’s a non-issue. If evidence does not exist yet it does not exist yet. It doesn’t cast doubt on your methods of reasoning, nor does it allow you make a baseless guess of what might come in the future.

If you count the amount of “wanting to switch” you expect to have because the cable guy hasn’t arrived yet, it should equal exactly the amount of “wishing you hadn’t been wrong” you expect to have if you pick the second half because the cable guy arrived before your window started.

I’m not sure how to say this so it’s more easily parseable, but this equality is

exactly what conservation of expected evidence describes.At 10am tomorrow, I can legitimately express my confidence in the proposition “the cable guy will arrive after noon” is different to what it was today.

There are two cases to consider:

The cable guy arrived before 10am (occurs with 25% probability). In this case, I expect that he has a close on zero probability of arriving after noon.

The cable guy is known not to have arrived before 10am (occurs with 75% probability). At this point, I calculate that the odds of the cable guy turning up after noon are two in three.

But none of this takes anything away from the original statement:

This is because I am changing my probability estimate on the basis of

new information received—it’s not a fixed proposition.Eliezer—what if the presence of the gene was decided by an omnipotent being called Omega? Then you’d break out the Spearmint, right?

I’ll modify my advice. If the probability that “I do action A in order to increase my subjective probability of position P” is greater given P than given not P, then doing A in order to increase my subjective probability of position P will be evidence in favor of P.

So in many cases, there will such a plan that I can devise. Let’s see Eliezer find a way out of this one.

Before you have actually done A, since it might fail because of ~P (which is what the thing you said actually means), your confidence is still the same as before you came up with the plan. We’re still at t=0. Information about your plan succeeding or not hasn’t arrived yet.

Now if over the course of planning you realize that the very ability you have to make the plan shifts probability estimate of P, then we’ve already got the new evidence. We’re at t=1, and the probability has shifted rightfully without violating the law. The evidence is no longer expected, it’s already here!

Before you started planning, you didn’t know that you would succeed and get this information. Not for certain. Or if you did, your estimate of probability of P was clearly wrong, but you hadn’t noticed it yet, where the “yet” is the time factor that distinguishes between t0 and t1 again...

Can’t cheat your way out of this at t=0, I’m afraid.

Let’s say you are organising a polar expedition. It will succeed (A) or fail (~A). There is a postulate that there are no man eating polar Cthulhu in the area (P). If there are some (~P), the expedition will fail (~A), thus entangling A with P.

You can do your best to prepare the expedition so that it will not fail for non-Cthulhu reasons, strengthening the entanglement - ~A becomes stronger evidence for ~P. You can also do your best to prepare the expedition to survive even the man eating polar Cthulhu, weakening the entanglement—by introducing a higher probability of A&~P, we’re making A weaker evidence for P.

Do any of these preparations, in themselves, actually influence the amount of man eating polar Cthulhu in the area?

Actually, the Omega situation is a perfect example. Someone facing the two boxes would like to increase his subjective probability that there is a million in the second box, and he is able to do this by deciding to take only the second box. If he decides to take both, on the other hand, he should decrease his credence in the presence of the million, even before opening the box.

In this case, the decision he’s leaning toward is evidence of the presence of $1M, by way of Omega’s observed reliability in predicting decisions of agents like him.

Fantastic heuristic! It’s like x=y·(z/y)+(1-y)·(x-z)/(1-y) for the rationalist’s soul :)

It’s worth noting, though, that you

canrationally expect your credence in a certain belief “to increase”, in the following sense: If I roll a die, and I’m about to show you the result, your credence that it didn’t land 6 is now^{5}⁄_{6}, and you’re^{5}⁄_{6}sure that this credence it about to increase to 1.I think this is what makes people feel like they can have a non-trivial expected value for their new beliefs: you can

expect an increaseorexpect a decrease, but quantitatively the two possibilities exactly cancel each out in theexpected valueof your belief.No, you can’t, because you also expect with

^{1}⁄_{6}probability that your credence will go down to zero:^{5}⁄_{6}+ (5/6−5/6) =^{1}⁄_{6}) + (1/6^{5}⁄_{6}.In order to fully understand this concept, it helped me to think about it this way: any evidence shifting your expectated change in confidence will necessarily cause a corresponding shift in your actual confidence. Suppose you hold some belief B with confidence C. Now some new experiment is being performed that will produce more data about B. If you had some prior evidence that the new data is expected to shift your confidence to C’, that same evidence would already have shifted C to C’, thus maintaining the conservation of expected evidence.

Consider the following example: initially, if someone were to ask you to bet on the veracity of B, you would choose odds C:(1-C). Suppose an oracle reveals to you that there is a

^{1}⁄_{3}chance of the new data shifting your confidence to C+ and a^{2}⁄_{3}chance of it shifting to C-, giving C’=(C + (C+)/3 − 2C(-)/3). What would you then consider to be fair odds on B’s correctness?I have a theory that I will post this comment. By posting the comment, I’m seeking evidence to confirm the theory. If I post the comment, my probability will be higher than before.

Similarly, in Newcomb’s problem, I seek evidence that box A has a million dollars, so I refrain from taking box B. There was money in box B, but I didn’t take it, because that would give me evidence that box A was empty.

In short, there’s one exception to this: when your choice is the evidence.

The simple answer is that your choice is also probabilistic. Let’s say that your disposition is one that would make it very likely you will choose to take only box A. Then this fact about yourself becomes evidence for the proposition that A contains a million dollars. Likewise if your disposition was to take both, it would provide evidence that A was empty.

Now let’s say that you’re pretty damn certain that this Omega guy is who he says he is, and that he was able to predict this disposition of yours; then, noting your decision to take only A stands as strong evidence that the box contains the million dollars. Likewise with the decision to take both.

But what if, you say, I already expected to be the kind of person who would take only box A? That is, that the probability distribution over my expected dispositions was 95% only box A and 5% both boxes? Well then it follows that your prior over the contents of box A will be 95% that is contains the million and 5% that it is empty. And as a result, the likely case of you actually choosing to take only box A need only have a small effect on your expectation of the contents of the box (~.05 change to reach ~1), but in the case that you introspect and find that really, you’re the kind of person who would take both, then your expectation that the box has a million dollars will drop by exactly 19(=.95/.05) times as much as it would get raised by the opposite evidence (resulting in ~0 chance that it contains the million). Making the less likely choice will create a much greater change in expectation, while the more common choice will induce a smaller change (since you already expected the result of that choice).

Hope that made sense.

There is more discussion of this post here as part of the Rerunning the Sequences series.

Wouldn’t the rule be something more like:

((P(H|E) > P(H)) if and only if (P(H) > P(H|~E))) and ((P(H|E) = P(H)) if and only if (P(H) = P(H|~E)))

So, if some statement is evidence of a hypothesis, its negation must be evidence against. And if some statement’s truth value is independent of a hypothesis, then so is that statements negation.

This is implied by the expectation of posterior probabilities version. Since P(E) + P(~E) = 1, that means that P(H|E) and P(H|~E) are either equal, or one is greater than P(H) and one is less than. If they were both less than P(H), then P(H|E)P(E)+P(H|~E)P(~E) would have a lesser value than the largest conditional probability in that formula; suppose P(H|E) is the greater one, then P(H|E)P(E)+P(H|~E)P(~E) < P(H|E) and P(H|E) < P(H), so P(H|E)P(E)+P(H|~E)P(~E) ≠ P(H). If they are both larger than P(H), then P(H|E)P(E)+P(H|~E)P(~E) must be larger than the smallest conditional probability in that formula; suppose that P(H|E) is the smaller one, then we have P(H|E)P(E)+P(H|~E)P(~E) > P(H|E), and P(H|E) > P(H), so P(H) ≠ P(H|E)P(E)+P(H|~E)P(~E). And if both posterior probabilities are equal, then P(H|E)P(E)+P(H|~E)P(~E) = P(H|E), and both posteriors must eqaul the prior. Q.e.d.

I think that the formula that expresses the prior as the average of the posterior probability weighted by the probabilities of observing that evidence and not observing that evidence, is a great way to express the point of this article. But it might not be trivial for everyone to get:

from

That something is evidence in favor if and only if its negation is evidence against, and that some result is independent of some hypothesis if and only if not observing that result is independent of that hypothesis, are the take home messages of this post as far as i can tell. The law that “P(H) = P(H|E)P(E) + P(H|~E)P(~E)” says more than that, it also tells you how to get P(H|~E) from P(H|E), P(H) and P(E). But adding the boolean statement and its proof from the weighted average statement to the post, or at least to a comment on this post, not even necessarily using the boolean symbols or formalisms, might help a lot of students that come across this long after algebra class. I know it would have helped me.

Hi, I’m new here but I’ve been following the sequences in the suggested order up to this point.

I have no problem with the main idea of this article. I say this only so that everyone knows that I’m nitpicking. If you’re not interested in nitpicking then just ignore this post.

I don’t think that the example given bellow is a very good one to demonstrate the concept of Conservation of Expected Evidence:

Assuming I’m reading this correctly:

Our Prior is P(G) = The probability that God Exists (let’s assume this is the Judeo-Christain God since that seems to be the intended target)

P(T) = the probability that God is Testing Humanity by not revealing his existence

P(M) = the probability that the Miracles of the bible are true.

The issue that I find with this is that P(G|T) = 1

If God is testing Humanity by hiding his existence there is a 100% chance that God Exists. I was going to write out the whole Bayesian equation to explain why this is true, but I think it’s pretty intuitive. P(T) cannot be evidence for P(G) since it assumes that P(G) is true.

Another issue is that the way this is written you’re implying that P(M) = P(~T). But this is not true, since the Miracles of the bible existing is not the direct opposite of God testing humanity by not reveling his existence. Unless you intend to completely twist the argument that most people are making when they say assert P(T) as truth. They aren’t saying or even implying that God wants there to be no evidence at all of his existence. Most theist would instead argue that the existence of miracles are a part of God’s test for humanity. They say that God sent us miraculous signs and prophets instead of just coming down and saying “Hey humanity, I’m God” because he wanted to test our faith. Had they the mathematical language, they would say that P(T|M) > P(T), meaning M serves as evidence of T. Not P(M) = P(~T)

Though this whole concept of God “testing humanity by not revealing himself” does seem more like an example of Belief in belief, where P(T) was devised as a means to justify the existence of an invisible God, I still feel like the example you’ve given is a bit of a stretch.

I would say, rather, that:

G = God exists

N = The existence of God is not revealed directly to humanity

M = Miracles occur

...and we’re talking about P(G|N) and P(G|M) and not talking about P(T) at all.

More generally, T seems to be a red herring here.

That said, I agree that there’s a presumption that M implies ~N… that is, that if miracles occurred, that would constitute the direct revelation of God’s existence.

And yes, one could argue instead that no, miracles aren’t a revelation of God’s existence at all, but rather a test of faith. A lot depends here on what counts as a miracle; further discussion along this line would benefit from specificity.

I agree that T in and of itself is problematic.

Your N seems more likely what the author intended, now that you point it out.

Though I still don’t think anyone who thought about it for more than 20 seconds would ever assert that N could be used as evidence for G.

But using that as a model would probably serve well to underscore the point of Conservation of Evidence

If the fact that God has not been revealed directly to humanity is evidence for the existence of God. Then should God ever reveal himself directly to humanity, it would be evidence against his existence.

That’s probably the statement Eliezer intended to make.

(nods)

And I would not be in the least surprised to find theologians arguing that the absence of direct evidence of God’s existence is itself proof of the existence of God, and I would be somewhat surprised to find that none ever had, but I don’t have examples.

That said, straw theism is not particularly uncommon on LW; when people want a go-to example of invalid reasoning, belief in god comes readily to hand. It derives from a common cultural presumption of atheism, although there are some theists around.

Is this the same as Jaynes’ method for construction of a prior using transformation invariance on acquisition of new evidence?

Does conservation of expected evidence always uniquely determine a probability distribution? If so, it should eliminate a bunch of extraneous methods of construction of priors. For example, you would immediately know if an application of MaxEnt was justified.

Eliezer, isn’t the “equal” part untrue? I like the parallel with Newton’s 3rd law, but the two terms P(H|E)*P(E) and P(H|~E)*P(~E)

aren’tnumerically equal—we only know that they sum to P(H).The

changesare equal and opposite:[ P(H|E) - P(H) ]*P(E) + [ P(H|~E) - P(H) ]*P(~E) = 0

See Nick Hay’s much earlier comment.

P(H) is the belief where you start, and P(H|E) and P(H|~E) are the possible beliefs where you end. You could go to one with probability P(E) and to the other with probability P(~E), but due to the identity you quote, in expectation you do not move at all.

Old post, but isn’t evidence that disconfirms the theory X equal to confirming ~X? Is ~X ineligible to be considered a theory?

Everything in that quote applies just as much to disconfirming a theory as it does to confirming a theory. Conservation of expected evidence means that you cannot legitimately expect your confidence in a theory to go

downeither.The hyperlink “An Intuitive Explanation of Bayesian Reasoning” is broken. The current location of that essay is here: http://yudkowsky.net/rational/bayes

Mantel cox log rank tests compare observations and expectations too...

Can someone tell me if I understand this correctly : He is saying that we must be clear before hand what constitutes evidence for and what constitutes evidence against and what doesn’t constitute evidence either way?

Because in his examples it seems that what is being changed is what counts as evidence. It seems that no matter what transpires (in the witch trials for example) it is counted as evidence for. This is not the same as changing the hypothesis to fit the facts. The hypothesis was always ‘she’s a witch’. Then the evidence is interpreted as supportive of the hypothesis no matter what.

You don’t necessarily have to figure it out beforehand (though it’s certainly harder to fool yourself if you do). But if X is evidence for Y then not-X has to be evidence for not-Y.

And yes, one thing that’s going wrong in those witch trials is that both X and not-X are being treated as evidence for Y, which can’t possibly be correct. (And the way in which it’s going wrong is that the prosecutor correctly observes that Y

could produceX or not-X, whichever of the two actually happened to turn up, and fails to distinguish between that and showing that Y ismore likely to producethat outcome than not-Y, which is what would actually make the evidence go in the claimed direction.)Did anyone say it is? I’m not seeing where.

Hi, new here.

I was wondering if I’ve interpreted this correctly:

‘For a true Bayesian, it is impossible to seek evidence that confirms a theory. There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before. You can only ever seek evidence to test a theory, not to confirm it.’

Does this mean that it is impossible to prove the truth of a theory? Because the only evidence that can exist is evidence that falsifies the theory, or supports it?

For example, something people know about gravity and objects under it’s influence, is that on Earth objects will accelerate at something like 9.81ms^-2. If we dropped a thousand different objects and observed their acceleration, and found it to be 9.81ms^-2, we would have a thousand pieces of evidence supporting the theory, and zero pieces to falsify the theory. We all believe that 9.81 is correct, and we teach that it is the truth, but we can never really know, because new evidence could someday appear that challenges the theory, correct?

Thanks

It is correct that we can never find enough evidence to make our certainty of a theory to be exactly 1 (though we can get it very close to 1). If we were absolutely certain in a theory, then

noamount of counterevidence, no matter how damning, could ever change our mind.The important part of the sentence here is

seek. The isn’t about falsificationism, but the fact that no experiment you can do can confirm a theory without having some chance of falsifying it too. So any observation can only provide evidence for a hypothesis if a different outcome could have provided the opposite evidence.For instance, suppose that you flip a coin. You can seek to

testthe theory that the result was`HEADS`

, by simply looking at the coin with your eyes. There’s a 50% chance that the outcome of this test would be “you see the`HEADS`

side”, confirming your theory (`p(HEADS | you see HEADS) ~ 1`

). But this only works because there’s also a 50% chance that the outcome of the test would have shown the result to be`TAILS`

, falsifying your theory (`P(HEADS | you see TAILS) ~ 0`

). And in fact there’s no way to measure the coin so that one outcome would be evidence in favour of`HEADS`

(`P(HEADS | measurement) > 0.5`

), without the opposite result being evidence against`HEADS`

(`P(HEADS | ¬measurement) < 0.5`

).Closely related is the law of total expectation: https://en.wikipedia.org/wiki/Law_of_total_expectation

It states that E[E[X|Y]]=E[X].

I do not understand the validity of this statement:

Given a temporal proposition A among a set of other mututally exclusive temporal propositions {A, B, C...}, demonstrating B, C, and other candidates do not meet the evidence so far

whileA meets the evidence so far does raise our confidence in the proposition *continuing to hold*. This is standard Bayesian inference applied to temporal statements.For example, we have higher confidence in the statement “the sun will come up tomorrow” than the statement “the sun will not come up tomorrow”, because the sun

hascome up in the past, whereas it hasnotnot come up comparably fewer times. We have relied on the prior distribution to make confident statements about the result of an impending experiment, and can constrain our confidence using the number of prior experiments that conform to it—further, every new experiment that confirms “the sun will come up” makes it harder to argue that “the sun will not come up” because the latter statement now has to explain *why* it failed to apply in the prior cases as well as why it will work now.It would seem quantifying the prior distribution against a set of mutually-exclusive statements thus *is* a valid strategy for raising confidence in a specific statement.

Maybe I’m misinterpreting what “fixed proposition” means here or am missing something more fundamental?