Ege Erdil comments on Average probabilities, not log odds

Ege Erdil 14 Nov 2021 10:36 UTC
10 points
I don’t know what you’re talking about here. You don’t need any nonlinear functions to recover the probability. The probability implied by $M (T)$ is just $M (T)$ , and the probability you should forecast having seen $M (X)$ is therefore
$P (E | M (X)) = E [1_{E} | F_{X}] = E [E [1_{E} | F_{T}] | F_{X}] = E [M (T) | F_{X}] = M (X)$
since $M$ is a martingale.
I think you don’t really understand what my example is doing. $M$ is not a Brownian motion and its increments are not Gaussian; it’s a nonlinear transform of a drift-diffusion process by a sigmoid which takes values in $[0, 1]$ . $M$ itself is already a martingale so you don’t need to apply any nonlinear transformation to M on top of that in order to recover any probabilities.
The explicit definition is that you take an underlying drift-diffusion process Y following
$d Y = σ^{2} (\frac{e^{Y} - 1}{e^{Y} + 1}) d t + σ d z$
and let $M = 1 - 1 / (e^{Y} + 1)$ . You can check that this $M$ is a martingale by using Ito’s lemma.
If you’re still not convinced, you can actually use my Python script in the original comment to obtain calibration data for the experts using Monte Carlo simulations. If you do that, you’ll notice that they are well calibrated and not overconfident.
- AlexMennen 14 Nov 2021 16:20 UTC
  5 points
  Parent
  Oh, you’re right, sorry; I’d misinterpreted you as saying that M represented the log odds. What you actually did was far more sensible than that.
  - Ege Erdil 14 Nov 2021 17:06 UTC
    3 points
    Parent
    That’s alright, it’s partly on me for not being clear enough in my original comment.
    I think information aggregation from different experts is in general a nontrivial and context-dependent problem. If you’re trying to actually add up different forecasts to obtain some composite result it’s probably better to average probabilities; but aside from my toy model in the original comment, “field data” from Metaculus also backs up the idea that on single binary questions median forecasts or log odds average consistently beats probability averages.
    I agree with SimonM that the question of which aggregation method is best has to be answered empirically in specific contexts and theoretical arguments or models (including mine) are at best weakly informative about that.