So, what difference does Mr. Frequentist see between the two experiments? In George’s case we have no information except the final results. For Bessel on the other hand, once we understand the method that was used to determine the results, we know that at every intermediate step before the final result the cure rate was less than 70%.
I don’t think that makes a difference. Supposing the scenario to assume that different patients have independent responses to treatment, the Bayesian’s sequence of outcomes and the frequentist’s are different permutations of 70 heads and 30 tails, of equal probability. Whatever they say about the efficacy of the treatment, they say the same thing.
It might be unlikely for the frequentist’s sequence of outcomes to only hit his target p-value at the 100th patient, but this probability does not bear on the efficacy of the treatment, and the frequentist is ignoring it anyway.
In practice I would be more concerned that the motivation to get a desired result might corrupt the patient evaluations (assumed away in Jaynes’s thought experiment).
Correct, in reality the world doesn’t change if we reorder our results. The point is that for a frequentist it feels like it should. Because the method is flawed, it seems right for the result to be less right. This is a bad way of analyzing results, but not as bad a way to evaluate methodologies.
Your valid concern about corrupted results stems from the correlation between bad behavior and what a frequentist calls a bad methodology.
Bessel’s methodology is not inherently bad either. If Bessel believed that the treatment would save lives and needed to keep going to prove it, wouldn’t he behave the same way?
We need a Bayesian methodology that can help evaluate methodology with and without informative priors. This probably already exists in literature, but we won’t be able to overcome the use of p-values until it is common knowledge.
I don’t think that makes a difference. Supposing the scenario to assume that different patients have independent responses to treatment, the Bayesian’s sequence of outcomes and the frequentist’s are different permutations of 70 heads and 30 tails, of equal probability. Whatever they say about the efficacy of the treatment, they say the same thing.
It might be unlikely for the frequentist’s sequence of outcomes to only hit his target p-value at the 100th patient, but this probability does not bear on the efficacy of the treatment, and the frequentist is ignoring it anyway.
In practice I would be more concerned that the motivation to get a desired result might corrupt the patient evaluations (assumed away in Jaynes’s thought experiment).
Correct, in reality the world doesn’t change if we reorder our results. The point is that for a frequentist it feels like it should. Because the method is flawed, it seems right for the result to be less right. This is a bad way of analyzing results, but not as bad a way to evaluate methodologies.
Your valid concern about corrupted results stems from the correlation between bad behavior and what a frequentist calls a bad methodology.
Bessel’s methodology is not inherently bad either. If Bessel believed that the treatment would save lives and needed to keep going to prove it, wouldn’t he behave the same way?
We need a Bayesian methodology that can help evaluate methodology with and without informative priors. This probably already exists in literature, but we won’t be able to overcome the use of p-values until it is common knowledge.