Experiment: Test your priors on Bernoulli processes.
I have run 1,000,000 experiments. Each experiment consists of 5 trials with binary outcomes, either (for left) or (for right).
However, I’m not going to tell you how I’ve picked my experiments. Maybe I’m just flipping a fair coin each time. Maybe I’m using a biased coin. Or maybe I’m doing something completely different, like dropping a bouncy ball down a mountain and checking whether it hits a red rock or a white rock first—and different experiments are conducted on different mountains. I might be doing some combination of all three.
You do get one guarantee, though: All the experiments are Bernoulli processes. In particular, the order of the trials is irrelevant.
Your goal is to guess the marginal frequencies of the fifth trial. For each , you need to tell me the frequency that the fifth trial is an given that of the outcomes of the first four trials are .
For example, if every experiment is just flipping a fair coin, then the fifth trial will be an with probability , no matter what the first four are. However, if I’m using biased coins, then the frequency of will increase the more s seen.
To help you in your guessing, I have provided a csv of all the public trials. As an answer, please provide a list like [0.3, 0.4, 0.5, 0.6, 0.7] of your frequencies—the kth element of your list is the marginal frequency over the experiments with of the first four trials being .
I haven’t yet looked at the frequencies myself, but I will do so shortly after posting this. If you want to test your guesses against others, I have created a market on Manifold Markets. I will resolve the market before I reveal the correct frequencies, which will happen in around two weeks, but maybe earlier or later depending on trading volume.
Good luck!
How would you answer this without looking at the csv?
I wrote a post on my prior over Bernoulli distributions, called “Rethinking Laplace’s Law of Success”. Laplace’s Law of Succession is based on a uniform prior over [0,1], whereas my prior is based on the following mixture distribution:
Using this prior, we get the result [0.106, 0.348, 0.500, 0.652, 0.894]
The numbers are predictions for P(5th trial = R | k Rs observed in first 4 trials):
If you see 0 Rs in the first 4 trials (all Ls), there’s a 10.6% chance the 5th is R
If you see 1 R in the first 4 trials, there’s a 34.8% chance the 5th is R
If you see 2 Rs in the first 4 trials, there’s a 50% chance the 5th is R
If you see 3 Rs in the first 4 trials, there’s a 65.2% chance the 5th is R
If you see 4 Rs in the first 4 trials (all Rs), there’s an 89.4% chance the 5th is R
The Laplace’s Rule of Succession five numbers using the are [0.167, 0.333, 0.500, 0.667, 0.833], but I think this is too conservative because it underestimate the likelihood of near-deterministic processes.
I’m not entirely sure what’s being asked here. Is this asking “if we do experiment 1000001 and see k Rs in the first four trials, then what credence do you assign to the 5th trial being R?”
Or is it “if we take a random experiment out of the million and see k Rs in the first four trials, then what credence do you assign to the 5th trial being R”? This isn’t the same question as the first.
Or is it something else again?
It’s asking, “If I draw a histogram of the frequency of R of the fifth trial, with buckets corresponding to the number of Rs in the first four trials, what will the heights of the bars be?”
We are not doing any more experiments. All the experiments have already been done in the 1,000,000 provided experiments. I’ve just left out the fifth trial from these experiments.
This is almost the same question as, “If we do experiment 1000001 and see k Rs in the first four trials, then what credence do you assign to the 5th trial being R,” but not quite. Your goal is to predict the marginals frequencies for the experiments I have actually conducted, not any idealized “next experiment”. Because 1,000,000 trials is so many, this should be close, but they are not quite the same. The actual marginal frequencies will have some noise, for example.
I hope this helps! If you need more explanation, feel free to ask.
Also tried this, and basically ended up with the same answer as commenter One.
Key idea is that we really only care about drawing 5 trials from this process. So we just have to find a probability distribution over 6 outcomes: a count of R for our 5 trials from 0-5. 10^6 datapoints is enough to kill a fair amount of noise by self-averaging, so I treated the fact that hiding a random trial has to reproduce the observed 4-trial distribution as just a hard constraint. (It’s a linear constraint in the probabilities.) Then did maximum entropy optimization subject to that constraint. The output distribution in terms of 5-trial counts looked pretty symmetric and was heavier towards the extremes.
Another quick computation from these values yields the p(R | k) numbers asked for in the question: [0.11118619, 0.32422537, 0.49942029, 0.67519768, 0.88914787]
[0.111019, 0.324513, 0.5, 0.675487, 0.888981]
Answer:
[0.111020, 0.324512, 0.5, 0.675488, 0.888980]
I will provide my solution when the market is resolved.
Decided to provide my solution since others have done so as well.
Solution
The public dataset is approximately symmetrical, so it is very likely that the distribution of the Bernoulli rate is also symmetrical (probability at p is equal to probability at 1-p). Let the probabilities of getting k Rs over all 5 trials for k=0...5 be (a,b,c,c,b,a). Then, from the public dataset, we have a+b/5≈0.252854,4b/5+2c/5≈0.166231,6c/5≈0.161832. These have standard deviation ≈0.0004, which is negligible, so we can treat these as linear equations. Solving, we get a=0.224781,b=0.140359,c=0.13486, and we can then solve for the marginal frequencies b/5a+b/5=0.111020,2c/54b/5+2c/5=0.324512, etc.
Not sure if this (experiment set?) is a good test of priors, since I got an exact answer without having to consider priors, other than the data being symmetrical. (This also means that any symmetric distribution for the Bernoulli rate will result in the same answer.) Though @DaemonicSigil has a similar solution without using symmetry, instead using
maximum entropy as a prior (if i understand it correctly).
Still, almost all reasonable priors will result in very similar outcomes, differing by a factor probably on the order of the standard deviation (around 10−3.) This is likely less than, or at least comparable to, the noise in the actual marginal frequencies.
You’re mostly right. The other solves have given pretty much identical distributions.
Some of your distributions are worse than other distributions. If I run 100,000,000 experiments and calculate the frequencies, some of you will be more off at the fourth decimal point.
The market doesn’t have that kind of precision, and even if it did, I wouldn’t change the resolution criterion. But I can still score you guys myself later on.
I do agree that I should have given much fewer public experiments. Then it would be a better test on priors.
I think those aren’t quite equivalent statements? If I pick my favorite string of bits, and shuffle it by a random permutation, then the probability of each bit being 1 is equal, the order is totally irrelevant (it was chosen at random), but it’s not Bernoulli because the trials aren’t independent of each other (if you know what my favorite string of bits is, you can learn the final bit as soon as you’ve observed all the rest.)
That’s what “in particular” means, i.e. the “the order of the trials is irrelevant” is a particular feature
Correct, they are not equivalent. The second statement is a consequence of the first. I made this consequence explicit to justify my choice later on to bucket by the number of Rs but not their order.
The first statement, though, is also true. It’s your full guarantee.
To clarify, the ground truth P(R) is constrained to be constant over the 5 trials of any given experiment?
The Bernoulli rate is drawn according to
Beta(0.6,0.6)
giving posterior
k+0.65.2.
No; your distribution gives probabilities [0.253247, 0.168831, 0.155844, 0.168831, 0.253247] for the number of Rs in the first four trials. This predicts that the number of experiments with two Rs is binomially (i.e. approximately normally) distributed with mean ~155844 and standard deviation ~363, but the actual number is 161832, around 16 standard deviations away from the mean.