# Ege Erdil comments on Average probabilities, not log odds

• The experts in my model are designed to be perfectly calibrated. What do you mean by “they are overconfident”?

• The probability of the event is the expected value of the probability implied by M(T). The experts report M(X) for a random variable X sampled uniformly in [0,T]. M(T) differs from M(X) by a Gaussian of mean 0, and hence, knowing M(X), the expected value of M(T) is just M(X). But we want the expected value of the probability implied by M(T), which is different from the probability implied by the expected value of M(T), because expected value does not commute with nonlinear functions. So an expert reporting the probability implied by M(X) is not well-calibrated, even though an expert reporting M(X) is giving an unbiased estimate of M(T).

• I don’t know what you’re talking about here. You don’t need any nonlinear functions to recover the probability. The probability implied by is just , and the probability you should forecast having seen is therefore

since is a martingale.

I think you don’t really understand what my example is doing. is not a Brownian motion and its increments are not Gaussian; it’s a nonlinear transform of a drift-diffusion process by a sigmoid which takes values in . itself is already a martingale so you don’t need to apply any nonlinear transformation to M on top of that in order to recover any probabilities.

The explicit definition is that you take an underlying drift-diffusion process Y following

and let . You can check that this is a martingale by using Ito’s lemma.

If you’re still not convinced, you can actually use my Python script in the original comment to obtain calibration data for the experts using Monte Carlo simulations. If you do that, you’ll notice that they are well calibrated and not overconfident.

• Oh, you’re right, sorry; I’d misinterpreted you as saying that M represented the log odds. What you actually did was far more sensible than that.

• That’s alright, it’s partly on me for not being clear enough in my original comment.

I think information aggregation from different experts is in general a nontrivial and context-dependent problem. If you’re trying to actually add up different forecasts to obtain some composite result it’s probably better to average probabilities; but aside from my toy model in the original comment, “field data” from Metaculus also backs up the idea that on single binary questions median forecasts or log odds average consistently beats probability averages.

I agree with SimonM that the question of which aggregation method is best has to be answered empirically in specific contexts and theoretical arguments or models (including mine) are at best weakly informative about that.