Radford Neal comments on Eliezer’s example on Bayesian statistics is wr… oops!

Radford Neal 17 Oct 2023 22:05 UTC
4 points
0
I think you’ve got his pretty much figured out. But you may be missing an additional subtlety.
You say “Bayesian likelihood ratios really do only depend on the probability each hypothesis assigned only to the information that you received”. Which could be interpreted as saying that the “likelihood function” is the probability assigned to the information received, seen as a function of f. But the likelihood function is actually not a function at all, but rather an equivalence class of functions of f that differ only by an overall positive scale factor.
You can see how this matters when comparing a report of an observation of 6 flips, such as HHHHHT, versus a report that gives only the number of tails, which is 1 in this case. The probability of HHHHHT as a function of f is $(1 - f)^{5} f$ , but the probability of 1 tail is $6 (1 - f)^{5} f$ , which is not the same function, but is in the same equivalence class, since it differs only by an overall factor of 6. Of course, this overall scale factor cancels out when looking at likelihood ratios for different values of f.
- Zane 18 Oct 2023 0:09 UTC
  1 point
  0
  Parent
  Yeah, I discovered that part on accident at one point because I used the binomial distribution equation in a situation where it didn’t really apply, but still got the right answer.
  I would think the most natural way to write a likelihood function would be to divide by the integral from 0 to 1, so that the total area under the curve is 1. That way the integral from a to b gives the probability the hypothesis assigns to receiving a result between a and b. But all that really matters is the ratios, which stay the same even without that.
  - Radford Neal 18 Oct 2023 1:38 UTC
    1 point
    0
    Parent
    Integrals of the likelihood function aren’t really meaningful, even if normalized so the integral is one over the whole range. This is because the result depends on the arbitrary choice of parameterization—eg, whether you parameterize a probability by p in [0,1], or by log(p) in [-oo,0]. In Bayesian inference, one always integrates the likelihood only after multiplying by the prior, which can be seen as a specification of how the integration is to be done.