Log-odds (or logits)

brilee28 Nov 2011 1:11 UTC

31 points

Logic & Mathematics Probability & Statistics

(I wrote this post for my own blog, and given the warm reception, I figured it would also be suitable for the LW audience. It contains some nicely formatted equations/tables in LaTeX, hence I’ve left it as a dropbox download.)

Logarithmic probabilities have appeared previously on LW here, here, and sporadically in the comments. The first is a link to a Eliezer post which covers essentially the same material. I believe this is a better introduction/description/guide to logarithmic probabilities than anything else that’s appeared on LW thus far.

Introduction:

Our conventional way of expressing probabilities has always frustrated me. For example, it is very easy to say nonsensical statements like, “110% chance of working”. Or, it is not obvious that the difference between 50% and 50.01% is trivial compared to the difference between 99.98% and 99.99%. It also fails to accommodate the math correctly when we want to say things like, “five times more likely”, because 50% * 5 overflows 100%.
Jacob and I have (re)discovered a mapping from probabilities to log- odds which addresses all of these issues. To boot, it accommodates Bayes’ theorem beautifully. For something so simple and fundamental, it certainly took a great deal of google searching/wikipedia surfing to discover that they are actually called “log-odds”, and that they were “discovered” in 1944, instead of the 1600s. Also, nobody seems to use log-odds, even though they are conceptually powerful. Thus, this primer serves to explain why we need log-odds, what they are, how to use them, and when to use them.

Article is here (Updated ¹¹⁄₃₀ to use base 10)

What links here?

brilee28 Nov 2011 1:11 UTC

31 points

21 comments1 min readLW link Archive

Logic & Mathematics Probability & Statistics

Jach 19 Nov 2017 14:55 UTC
13 points
0
Sorry for the necro—the linked article is 404′d. I uploaded a backup here. I didn’t find it on the author’s site but did find a copy through Web Archive; still, maybe my link will save someone else the hassle.
Manfred 28 Nov 2011 2:08 UTC
13 points
0
Comments:

Log base ten may be more intuitive for conversion purposes. Then adding another 9 corresponds to adding 1.

“Five times more likely” should overflow for probabilities greater than 0.2. This is because the terminology “times more likely” is usually used in the context of decision-making, so it manipulates the linear probabilities because that’s what goes into the expected utility.
- brilee 28 Nov 2011 2:47 UTC
  8 points
  0
  Parent
  Yeah, I was definitely thinking about that. The mathematician in me won out in the end.
  
  It occurs to me that a lot of people have probably thought about this, and they have alternately used base 2, base e, and base 10. Unless we get the entire LW community to standardize on one base, we won’t be able to coherently communicate with one another using log-probabilities, and therefore log-probabilities will stay relegated to the dustbin.
  
  base 2 - advantages, we can talk about N bytes’ worth of evidences.
  
  base e—mathematician’s base
  
  base 10 - common layperson can understand it, advantages with the 9′s and 0′s.
  
  Actually, I think you’re right, log base 10 is probably better. If others agree, I’ll rewrite the article in base 10.
  - Zack_M_Davis 28 Nov 2011 3:02 UTC
    12 points
    0
    Parent
    
    base e—mathematician’s base
    
    What’s the specific benefit of base e for log-odds, though? Base e has lots of special properties that make it useful in many areas of mathematics (e^x is its own derivative, de Moivre’s formula, &c.), but is this one of them? (It could be; I don’t know.)
    - [deleted] 28 Nov 2011 21:22 UTC
      12 points
      0
      Parent
      To quote Jaynes, p.91 of PT:TLoS:
      
      In many applications it is convenient to take the logarithm of the odds because of the fact that we can then add up terms. Now we could take the logarithm to any base we please, and this cost the writer some trouble. Our analytic expressions always look neater in terms of natural (base e) logarithms. But back in the 1940s and 1950s when this theory was first developed, we used base 10 logarithms because they were easier to find numerically; the four-figure tables would fit on a single page. Finding a natural logarithm was a tedious process, requiring leafing through enormous old volumes of tables.
      
      Today, thanks to hand calculators, all such tables are obsolete and anyone can find a ten-digit natural logarithm just as easily as a base 10 logarithm. Therefore, we started happily to rewrite this section in terms of the aesthetically prettier natural logarithms. But the result taught us that there is another, even stronger, reason for using base 10 logarithms. Our minds are thoroughly conditioned to the base 10 number system, and base 10 logarithms have an immediate, clear intuitive meaning to all of us. However, we just don’t know what to make of a conclusion stated in terms of natural logarithms, until it is translated back into base 10 terms. Therefore, we re-wrote this discussion, reluctantly, back into the old, ugly base 10 convention.
      
      So to answer your question, the only advantage of base e is that “ln” looks tidier than “log10″.
      
      Apart from being more intuitively understandable to humans, using base 10 also allows us to multiply by 10 and measure evidence in the familiar unit of decibels.
  - Steve_Rayhawk 30 Nov 2011 18:43 UTC
    8 points
    0
    Parent
    The natural unit of ratio, the neper (Np), is easier to interpret for small ratio contributions, where the derivative of exp(x) is ≈1:
    
    0.1Np = exp( 0.1) ∶ 1 ≈ 1.1 ∶ 1
    -0.1Np = exp(-0.1) ∶ 1 ≈ 0.9 ∶ 1
    
    This could make for an easy upgrade path to use of nepers or centinepers instead of percents in comparatives involving rates, which would reduce semantic confusion. “50% faster” can mean “gets 150% as far” (so .41Np faster, or 41 cNp, or perhaps 41Np%) or “takes 50% as much time” (so .69Np faster, or 69cNp, or 69Np%). That’s an argument for using nepers as a standard base outside communications of probability.
    
    (trivia: Nepers and radians are each other turned sideways, being respectively the real and imaginary parts of eigenvalues of linear differential equation systems.)
  - wedrifid 28 Nov 2011 12:36 UTC
    8 points
    0
    Parent
    
    base 2 - advantages, we can talk about N bytes’ worth of evidences.
    
    Wouldn’t it be easier to talk about N bytes worth of evidence in base 256? Bits of evidence seems the more useful metric!
  - brilee 30 Nov 2011 16:12 UTC
    0 points
    0
    Parent
    Article is rewritten in base 10, and I rewrote some of the explanation for Bayesian updates. Enjoy!
  - shokwave 28 Nov 2011 11:36 UTC
    0 points
    0
    Parent
    I would like to see the article in base 10.
JoshuaZ 28 Nov 2011 14:55 UTC
4 points
0

and spuriously in the comments

I don’t think this word means what you think it means.

(Also I didn’t know you were on Less Wrong. I had previously plugged this summary of log-odds on my blog and was considering mentioning it here.)
Pato Lubricado 30 Mar 2021 8:36 UTC
1 point
0
Can I find the article somewhere else? Link is dead now
- philip_b 30 Mar 2021 9:53 UTC
  2 points
  0
  Parent
  See Jach’s reply.
orthonormal 30 Nov 2011 4:46 UTC
1 point
0
Good work! You might mention that the reason why log-odds are awful for things like adding probabilities of two disjoint events is that there’s not a nice formula for log(x+y). That’s the price of turning multiplication into addition.
thakil 28 Nov 2011 11:06 UTC
1 point
0
I find it interesting that you lack familiarity with log-odds? What field are you in? Statisticians will usually be familar with them, as the logit is the canonical link function for the binomial function when using general linear modeling. Cut of (some) jargon, if I have a data set with binomial outcomes, and I wish to model my data as having normal errors, and the predictors as having linear effect on the outcome, I’d convert my data by using log odds. So, for instance, if I was looking at age as a predictor for diabetes (which is a yes no outcome)
- brilee 30 Nov 2011 16:09 UTC
  0 points
  0
  Parent
  I have a very strong competition math background from high school, but my primary field is chemistry.
roystgnr 28 Nov 2011 6:48 UTC
1 point
0
Of all the weird coincidences—I rediscovered this myself the week before last. (likewise inspired by previous LW discussion of log-odds, which seemed intuitively correct but not rigorously or symmetrically defined...)

What I failed to do, shamefully in view of your example, was to write everything up concisely and clearly to share with others. Thank you for being less short-sighted or less selfish.
Caspian 11 Mar 2012 2:29 UTC
0 points
0
It’s a good article for learning about log odds, but I disagree with some of the justification. Yes it is easy to say something has a 110% chance of working, but a nonsensical lie like this is better than a plausible lie which may trick you into believing it.
DanielLC 28 Nov 2011 5:05 UTC
0 points
0
It seems to me that this doesn’t have any real advantage over odds ratios. If I want to do a Bayesian update, I multiply the odds by the relative likelihood. In the example in the article (1/10,000 chance of having the disease, 3% false positive, and 1% false negative), You just take 1:9999 and multiply it by 0.99/0.03 = 33:1 for each successful test. Then you have 33:9999 = 1:303, then 33:303 = 11:101, and finally 363:101 for the final test. Then to change back, you just take 363/(363+101) = 78.23%. The calculations are slower (two multiplications vs. one addition), but it’s much easier and more intuitive to convert between them and traditional probabilities.
- brilee 28 Nov 2011 5:55 UTC
  11 points
  0
  Parent
  What you’ve described is in fact, exactly the same thing as log-odds—they’re simply separated by a logarithm/exponentiation. Thus, all the multiplications you describe are the counterpart of the additions I describe. I agree, we could work with odds ratio, without taking the logarithm—but using logarithms has the benefit of linearizing the probability space. The distance between 1 L% and 5 L% is the same as the distance between 10 L% and 14 L%, but you wouldn’t know it by looking at 2.72:1 and 150:1 versus 22,000:1 and 1,200,000:1.
fubarobfusco 28 Nov 2011 1:49 UTC
−4 points
0
Pick up Jaynes’ Probability Theory and turn to the section on decibels of evidence, an even more convenient measure. Or for a summary see Eliezer’s 0 And 1 Are Not Probabilities in the sequences.

When you work in log odds, the distance between any two degrees of uncertainty equals the amount of evidence you would need to go from one to the other. That is, the log odds gives us a natural measure of spacing among degrees of confidence.
- Zack_M_Davis 28 Nov 2011 2:15 UTC
  12 points
  0
  Parent
  
  Or for a summary see Eliezer’s 0 And 1 Are Not Probabilities
  
  (Downvoted; the OP already linked to that exact post.)