Experiment: Test your priors on Bernoulli processes.

joseph_c12 Oct 2025 22:09 UTC

18 points

I have run 1,000,000 experiments. Each experiment consists of 5 trials with binary outcomes, either $L$ (for left) or $R$ (for right).

However, I’m not going to tell you how I’ve picked my experiments. Maybe I’m just flipping a fair coin each time. Maybe I’m using a biased coin. Or maybe I’m doing something completely different, like dropping a bouncy ball down a mountain and checking whether it hits a red rock or a white rock first—and different experiments are conducted on different mountains. I might be doing some combination of all three.

You do get one guarantee, though: All the experiments are Bernoulli processes. In particular, the order of the trials is irrelevant.

Your goal is to guess the marginal frequencies of the fifth trial. For each $k = 0, 1, \dots, 4$ , you need to tell me the frequency that the fifth trial is an $R$ given that $k$ of the outcomes of the first four trials are $R$ .

For example, if every experiment is just flipping a fair coin, then the fifth trial will be an $R$ with probability $1 / 2$ , no matter what the first four are. However, if I’m using biased coins, then the frequency of $R$ will increase the more $R$ s seen.

To help you in your guessing, I have provided a csv of all the public trials. As an answer, please provide a list like [0.3, 0.4, 0.5, 0.6, 0.7] of your frequencies—the kth element of your list is the marginal frequency over the experiments with $k$ of the first four trials being $R$ .

I haven’t yet looked at the frequencies myself, but I will do so shortly after posting this. If you want to test your guesses against others, I have created a market on Manifold Markets. I will resolve the market before I reveal the correct frequencies, which will happen in around two weeks, but maybe earlier or later depending on trading volume.

Good luck!

joseph_c12 Oct 2025 22:09 UTC

18 points

14 comments1 min readLW link

Rationality Experiments Data Science Priors

Cleo Nardo 13 Oct 2025 17:34 UTC
6 points
0
How would you answer this without looking at the csv?
I wrote a post on my prior over Bernoulli distributions, called “Rethinking Laplace’s Law of Success”. Laplace’s Law of Succession is based on a uniform prior over [0,1], whereas my prior is based on the following mixture distribution:
w1 * logistic-normal(0, sigma^2) + w2 * 0.5(dirac(0) + dirac(1)) + w3 * thomae_{100}(α) + w4 * uniform(0,1)
where:
- The first term captures logistic transformations of normal variables (weight w1), resolving the issue that probabilities should be spread across log-odds
- The second term captures deterministic programs (weight w2), allowing for exactly zero and one
- The third term captures rational probabilities with simple fractions (weight w3), giving weight to simple ratios
- The fourth term captures uniform interval (weight w4), corresponding to Laplace’s original prior
The default parameters (w1=0.3, w2=0.1, w3=0.3, w4=0.3, sigma=5, alpha=2) reflect my intuition about the relative frequency of these different types of programs in practice.
Using this prior, we get the result [0.106, 0.348, 0.500, 0.652, 0.894]
The numbers are predictions for P(5th trial = R | k Rs observed in first 4 trials):
- If you see 0 Rs in the first 4 trials (all Ls), there’s a 10.6% chance the 5th is R
- If you see 1 R in the first 4 trials, there’s a 34.8% chance the 5th is R
- If you see 2 Rs in the first 4 trials, there’s a 50% chance the 5th is R
- If you see 3 Rs in the first 4 trials, there’s a 65.2% chance the 5th is R
- If you see 4 Rs in the first 4 trials (all Rs), there’s an 89.4% chance the 5th is R
The Laplace’s Rule of Succession five numbers using the are [0.167, 0.333, 0.500, 0.667, 0.833], but I think this is too conservative because it underestimate the likelihood of near-deterministic processes.
JBlack 13 Oct 2025 5:23 UTC
5 points
0
I’m not entirely sure what’s being asked here. Is this asking “if we do experiment 1000001 and see k Rs in the first four trials, then what credence do you assign to the 5th trial being R?”
Or is it “if we take a random experiment out of the million and see k Rs in the first four trials, then what credence do you assign to the 5th trial being R”? This isn’t the same question as the first.
Or is it something else again?
- joseph_c 13 Oct 2025 8:08 UTC
  1 point
  0
  Parent
  It’s asking, “If I draw a histogram of the frequency of R of the fifth trial, with buckets corresponding to the number of Rs in the first four trials, what will the heights of the bars be?”
  
  We are not doing any more experiments. All the experiments have already been done in the 1,000,000 provided experiments. I’ve just left out the fifth trial from these experiments.
  
  This is almost the same question as, “If we do experiment 1000001 and see k Rs in the first four trials, then what credence do you assign to the 5th trial being R,” but not quite. Your goal is to predict the marginals frequencies for the experiments I have actually conducted, not any idealized “next experiment”. Because 1,000,000 trials is so many, this should be close, but they are not quite the same. The actual marginal frequencies will have some noise, for example.
  
  I hope this helps! If you need more explanation, feel free to ask.
DaemonicSigil 13 Oct 2025 8:15 UTC
3 points
1
Also tried this, and basically ended up with the same answer as commenter One.

Key idea is that we really only care about drawing 5 trials from this process. So we just have to find a probability distribution over 6 outcomes: a count of $R$ for our 5 trials from 0-5. 10^6 datapoints is enough to kill a fair amount of noise by self-averaging, so I treated the fact that hiding a random trial has to reproduce the observed 4-trial distribution as just a hard constraint. (It’s a linear constraint in the probabilities.) Then did maximum entropy optimization subject to that constraint. The output distribution in terms of 5-trial counts looked pretty symmetric and was heavier towards the extremes.

Another quick computation from these values yields the p(R | k) numbers asked for in the question: [0.11118619, 0.32422537, 0.49942029, 0.67519768, 0.88914787]
Unnamed 13 Oct 2025 6:33 UTC
3 points
1
[0.111019, 0.324513, 0.5, 0.675487, 0.888981]
One 13 Oct 2025 4:53 UTC
3 points
2
Answer:
[0.111020, 0.324512, 0.5, 0.675488, 0.888980]
I will provide my solution when the market is resolved.
- One 13 Oct 2025 12:07 UTC
  1 point
  0
  Parent
  Decided to provide my solution since others have done so as well.
  Solution
  The public dataset is approximately symmetrical, so it is very likely that the distribution of the Bernoulli rate is also symmetrical (probability at p is equal to probability at 1-p). Let the probabilities of getting k Rs over all 5 trials for k=0...5 be $(a, b, c, c, b, a)$ . Then, from the public dataset, we have $a + b / 5 \approx 0.252854, 4 b / 5 + 2 c / 5 \approx 0.166231, 6 c / 5 \approx 0.161832$ . These have standard deviation $\approx 0.0004,$ which is negligible, so we can treat these as linear equations. Solving, we get $a = 0.224781, b = 0.140359, c = 0.13486$ , and we can then solve for the marginal frequencies $\frac{b / 5}{a + b / 5} = 0.111020, \frac{2 c / 5}{4 b / 5 + 2 c / 5} = 0.324512,$ etc.
  Not sure if this (experiment set?) is a good test of priors, since I got an exact answer without having to consider priors, other than the data being symmetrical. (This also means that any symmetric distribution for the Bernoulli rate will result in the same answer.) Though @DaemonicSigil has a similar solution without using symmetry, instead using
  maximum entropy as a prior (if i understand it correctly).
  Still, almost all reasonable priors will result in very similar outcomes, differing by a factor probably on the order of the standard deviation (around $10^{- 3}$ .) This is likely less than, or at least comparable to, the noise in the actual marginal frequencies.
  - joseph_c 13 Oct 2025 17:45 UTC
    2 points
    0
    Parent
    You’re mostly right. The other solves have given pretty much identical distributions.
    
    Some of your distributions are worse than other distributions. If I run 100,000,000 experiments and calculate the frequencies, some of you will be more off at the fourth decimal point.
    
    The market doesn’t have that kind of precision, and even if it did, I wouldn’t change the resolution criterion. But I can still score you guys myself later on.
    
    I do agree that I should have given much fewer public experiments. Then it would be a better test on priors.
AprilSR 13 Oct 2025 2:20 UTC
3 points
1
You do get one guarantee, though: All the experiments are Bernoulli processes. In particular, the order of the trials is irrelevant.
I think those aren’t quite equivalent statements? If I pick my favorite string of bits, and shuffle it by a random permutation, then the probability of each bit being 1 is equal, the order is totally irrelevant (it was chosen at random), but it’s not Bernoulli because the trials aren’t independent of each other (if you know what my favorite string of bits is, you can learn the final bit as soon as you’ve observed all the rest.)
- Cleo Nardo 13 Oct 2025 17:47 UTC
  4 points
  0
  Parent
  That’s what “in particular” means, i.e. the “the order of the trials is irrelevant” is a particular feature
- joseph_c 13 Oct 2025 2:39 UTC
  3 points
  0
  Parent
  Correct, they are not equivalent. The second statement is a consequence of the first. I made this consequence explicit to justify my choice later on to bucket by the number of $R$ s but not their order.
  
  The first statement, though, is also true. It’s your full guarantee.
foodforthought 13 Oct 2025 15:58 UTC
1 point
0
To clarify, the ground truth P(R) is constrained to be constant over the 5 trials of any given experiment?
James Camacho 13 Oct 2025 2:48 UTC
1 point
−1
The Bernoulli rate is drawn according to

$B e t a (0.6, 0.6)$

giving posterior

$\frac{k + 0.6}{5.2} .$
- One 13 Oct 2025 3:31 UTC
  1 point
  1
  Parent
  No; your distribution gives probabilities [0.253247, 0.168831, 0.155844, 0.168831, 0.253247] for the number of Rs in the first four trials. This predicts that the number of experiments with two Rs is binomially (i.e. approximately normally) distributed with mean ~155844 and standard deviation ~363, but the actual number is 161832, around 16 standard deviations away from the mean.