I believe, mathematically, your claim can be expressed as:
P(H|D) = argmaxθP(θ|D)
where θ is the ”probability“ parameter of the Bernoulli distribution, H represents the the proposition that heads occurs, and D represents our data. The left side of this equation is the plausibility based on knowledge and the right side is Professor Jaynes’ ‘estimate of the probability’ . How can we prove this statement?
Latex is being a nuisance as usual :) The right side of the equation is the argmax with respect to theta of P(theta | data)
I think argmax is not the way to go as the beta distribution and binomial likelihood is only symmetric when the coin is fair, if you want a point estimate the mean of the distribution is better, which will always be closer to 50⁄50 than the mode, and thus more conservative, you are essentially ignoring all the uncertainty of theta and thus overestimating the probability.
What is the theoretical justification behind taking the mean? Argmax feels more intuitive for me because it is literally “the most plausible value of theta”. In either case, whether we use argmax or mean, can we prove that it is equal to P(H|D)?
If I have a distribution of 2 kids and a professional boxer, and a random one is going to hit me, then argmax tells me that I will always be hit by a kids, sure if you draw from the distribution only once then argmax will beat the mean in 2⁄3 of the cases, but its much worse at answering what will happen if I draw 9 hits (argmax=nothing, mean=3hits from a boxer)
This distribution is skewed, like the beta distribution, and is therefore better summarized by the mean than the mode.
In Bayesian statistics argmax on sigma will often lead to sigma=0, if you assume that sigma follows a exponential distribution, thus it will lead you to assume that there is no variance in your sample
The variance is also lower around the mean than the mode if that counts as a theoretical justification :)