Hypothesis 1: The data are generated by a beta-binomial distribution, where first a probability x is drawn from a beta(a,b) distribution, and then 5 experiments are run using that probability x. I had my coding assistant write code to solve for the a,b that best fit the observed data and show the resulting distribution for that a,b. It gave (a,b) = (0.6032,0.6040) and a distribution that was close but still meaningfully off given the million experiment sample size (most notably, only .156 of draws from this model had 2 R’s compared with the observed .162).
Hypothesis 2: With probability c the data points were drawn from a beta-binomial distribution, and with probability 1-c the experiment instead used p=0.5. This came to mind as a simple process that would result in more experiments with exactly 2 R’s out of 4. With my coding assistant writing the code to solve for the 3 parameters a,b,c, this model came extremely close to the observed data—the largest error was .0003 and the difference was not statistically significant. This gave (a,b,c) = (0.5220,0.5227,0.9237).
I could have stopped there, since the fit was good enough so that anything else I’d do would probably only differ in its predictions after a few decimal places, but instead I went on to Hypothesis 3: the beta distribution is symmetric with a=b, so the probability is 0.5 with probability 1-c and drawn from beta(a,a) with probability c. I solved for a,c with more sigfigs than my previous code used (saving the rounding till the end), and found that it was not statistically significantly worse than the asymmetric beta from Hypothesis 2. I decided to go with this one because on priors a symmetric distribution is more likely than an asymmetric distribution that is extremely close to being symmetric. Final result: draw from a beta(0.5223485278, 0.5223485278) distribution with probability 0.9237184759 and use p=0.5 with probability 0.0762815241. This yields the above conditional probabilities out to 6 digits.
Explanation:
Hypothesis 1: The data are generated by a beta-binomial distribution, where first a probability x is drawn from a beta(a,b) distribution, and then 5 experiments are run using that probability x. I had my coding assistant write code to solve for the a,b that best fit the observed data and show the resulting distribution for that a,b. It gave (a,b) = (0.6032,0.6040) and a distribution that was close but still meaningfully off given the million experiment sample size (most notably, only .156 of draws from this model had 2 R’s compared with the observed .162).
Hypothesis 2: With probability c the data points were drawn from a beta-binomial distribution, and with probability 1-c the experiment instead used p=0.5. This came to mind as a simple process that would result in more experiments with exactly 2 R’s out of 4. With my coding assistant writing the code to solve for the 3 parameters a,b,c, this model came extremely close to the observed data—the largest error was .0003 and the difference was not statistically significant. This gave (a,b,c) = (0.5220,0.5227,0.9237).
I could have stopped there, since the fit was good enough so that anything else I’d do would probably only differ in its predictions after a few decimal places, but instead I went on to Hypothesis 3: the beta distribution is symmetric with a=b, so the probability is 0.5 with probability 1-c and drawn from beta(a,a) with probability c. I solved for a,c with more sigfigs than my previous code used (saving the rounding till the end), and found that it was not statistically significantly worse than the asymmetric beta from Hypothesis 2. I decided to go with this one because on priors a symmetric distribution is more likely than an asymmetric distribution that is extremely close to being symmetric. Final result: draw from a beta(0.5223485278, 0.5223485278) distribution with probability 0.9237184759 and use p=0.5 with probability 0.0762815241. This yields the above conditional probabilities out to 6 digits.