On stopping rules

(tl;dr: In this post I try to explain why I think the stopping rule of an experiment matters. It is likely that someone will find a flaw in my reasoning. That would be a great outcome as it would help me change my mind. Heads up: If you read this looking for new insight you may be disappointed to only find my confusion)

(Edited to add: Comments by Manfred and Ike seem to point correctly to the critical flaws in my reasoning. I will try to update my intuition over the next few days)

In the post “Don’t You Care If It Works Part 1” on the Main section of this website, Jacobian writes:

A few weeks ago I started reading beautiful probability and immediately thought that Eliezer is wrong about the stopping rule mattering to inference. I dropped everything and spent the next three hours convincing myself that the stopping rule doesn’t matter and I agree with Jaynes and Eliezer. As luck would have it, soon after that the stopping rule question was the topic of discussion at our local LW meetup. A couple people agreed with me and a couple didn’t and tried to prove it with math, but most of the room seemed to hold a third opinion: they disagreed but didn’t care to find out. I found that position quite mind-boggling. Ostensibly, most people are in that room because we read the sequences and thought that this EWOR (Eliezer’s Way Of Rationality) thing is pretty cool. EWOR is an epistemology based on the mathematical rules of probability, and the dude who came up with it apparently does mathematics for a living trying to save the world. It doesn’t seem like a stretch to think that if you disagree with Eliezer on a question of probability math, a question that he considers so obvious it requires no explanation, that’s a big frickin’ deal!

First, I’d like to point out that the mainstream academic term for Eliezer’s claim is The Strong Likelihood Principle. In the comments section, a vigorous discussion of stopping rules ensued.

My own intuition is that the strong likelihood principle is wrong. Moreover, there exist a small number of people whose opinion I give higher level of credence than Eliezer’s, and some of those people also disagree with him. For instance, I’ve been present in the room when a distinguished Professor of Biostatistics at Harvard stated matter-of-factly that the principle is trivially wrong. I also observed that he was not challenged on this by another full Professor of Biostatistics who is considered an expert on Bayesian inference.

So at best, the fact that Eliezer supports the strong likelihood principle is a single data point, ie pretty weak Bayesian evidence. I do however value Eliezer’s opinion, and in this case I recognize that I am confused. Being a good rationalist, I’m going to take that as an indication that it is time for The Ritual. Writing this post is part of my “ritual”: It is an attempt to clarify exactly why I think the stopping condition matters, and determine whether those reasons are valid. I expect a likely outcome is that someone will identify a flaw in my reasoning. This will be very useful and help improve my map-territory correspondence.

Suppose there are two coins in existence, both of which are biased: Coin A comes up heads with probability ²⁄₃ and tails with probability ¹⁄₃, whereas Coin B comes up heads with probability ¹⁄₃. Someone gives me a coin without telling me which one, my goal is to figure out if it is Coin A or Coin B. My prior is that they are equally likely.

There are two statisticians who both offer to do an experiment: Statistician 1 says that he will flip the coin 20 times and report the number of heads. Statistician 2 would really like me to believe that it is Coin B, and says he will terminate the experiment whenever there are more tails than heads. However, since Statistician 2 is kind of lazy and doesn’t have infinite time, he also says that if he reaches 20 flips he is going to call it quits and give up.

Both statisticians do the experiment, and both experiments end up with 12 heads and 8 tails. I trust both Statisticians to be honest about the experimental design and the stopping rules.

In the experiment of Statistician 1, the probability of getting this outcome if you have Coin A was 0.1486, whereas the probability of getting this outcome if it was Coin B was 0.0092. The likelihood ratio is therefore 16.1521 and the posterior probability of Coin A (after converting the prior to odds, applying the likelihood ratio and converting back to probability) is 0.94.

In the experiment of Statistician 2, however, I can’t just use the binomial distribution because there is an additional data point which is not Bernoulli, namely the number of coin flips. I therefore have to calculate, for both Coin A and Coin B, the probability that he would not terminate the experiment prior to the 20th flip, and that at that stage he would have 12 heads and 8 coins. Since the probability reaching 20 flips is much higher for Coin A than for Coin B, the likelihood ratio would be much higher than in the experiment of Statistician 1.

This should not be unexpected: If Statistician B gives me data that supports the hypothesis which his stopping rule was designed to discredit, then that data is stronger evidence than similar data coming from the neutral Statistician A.

In other words, the stopping rule matters. Yes, all the evidence in the trial is still in the likelihood ratio, but the likelihood ratio is different because there is an additional data point. Not considering this additional data point is statistical malpractice.