PhilGoetz comments on Open thread, Mar. 9 - Mar. 15, 2015

PhilGoetz 11 Mar 2015 23:22 UTC
3 points
0
I think I was wrong to say that 1 bit evidence = likelihood multiplier of 2.

IF you have a signal S, and P(x|S) = 1 while P(x|~S) = .5, then the likelihood multiplier is 2 and you get 1 bit of information, as computed by KL-divergence. That signal did in fact require an infinite amount of evidence to make P(x|S) = 1, I think, so it’s a theoretical signal found only in math problems, like a frictionless surface in physics.

If you have a signal S, and P(x|S) = .5 while P(x|~S) = .25, then the likelihood multiplier is 2, but you get only .2075 bits of information.

There’s a discussion of a similar question on stats.stackexchange.com . It appears that the sum, over a series of observations x, of

log(likelihood ratio = P(x | model 2) / P(x | model 1))

approximates the information gain from changing from model 1 to model 2, but not on a term-by-term basis. The approximation relies on the frequency of the observations in the entire observation series being drawn from a distribution close to model 2.