Registering now that my modal expectation is that the situation will mostly look the same in 2028 as it does today. (To give one example from AI 2027, scaling neuralese is going to be hard, and while I can imagine a specific set of changes that would make it possible, it would require changing some fairly fundamental things about model architecture which I can easily imagine taking 3 years to reach production. And neuralese is not the only roadblock to AGI.)
I think one of your general points is something like “slow is smooth, smooth is fast” and also “cooperative is smooth, smooth is fast”, both of which I agree with. But the whole “trauma” thing is too much like Bulverism for my taste.
Also tried this, and basically ended up with the same answer as commenter One.
Key idea is that we really only care about drawing 5 trials from this process. So we just have to find a probability distribution over 6 outcomes: a count of R for our 5 trials from 0-5. 10^6 datapoints is enough to kill a fair amount of noise by self-averaging, so I treated the fact that hiding a random trial has to reproduce the observed 4-trial distribution as just a hard constraint. (It’s a linear constraint in the probabilities.) Then did maximum entropy optimization subject to that constraint. The output distribution in terms of 5-trial counts looked pretty symmetric and was heavier towards the extremes.
Another quick computation from these values yields the p(R | k) numbers asked for in the question: [0.11118619, 0.32422537, 0.49942029, 0.67519768, 0.88914787]