The Fallacy of Large Numbers

dspeyer12 Aug 2012 18:39 UTC

30 points

I’ve been seeing this a lot lately, and I don’t think it’s been written about here before

Let’s start with a motivating example. Suppose you have a fleet of 100 cars (or horses, or people, or whatever). For any given car, on any given day, there’s a 3% chance that it’ll be out for repairs (or sick, or attending grandmothers’ funerals, or whatever). For simplicity’s sake, assume all failures are uncorrelated. How many cars can you afford to offer to customers each day? Take a moment to think of a number.

Well, 3% failure means 97% success. So we expect 97 to be available and can afford to offer 97. Does that sound good? Take a moment to answer.

Well, maybe not so good. Sometimes we’ll get unlucky. And not being able to deliver on a contract is painful. Maybe we should reserve 4 and only offer 96. Or maybe we’ll play it very safe and reserve twice the needed number. 6 in reserve, 94 for customers. But is that overkill? Take note of what you’re thinking now.

The likelihood of having more than 4 unavailable is 18%. The likelihood of having more than 6 unavailable is 3.1%. About once a month. Even reserving 8, requiring 9 failures to get you in trouble, gets you in trouble 0.3% of the time. More than once a year. Reserving 9 -- three times the expected—gets the risk down to 0.087% or a little less than every three years. A number we can finally feel safe with.

So much for expected values. What happened to the Law of Large Numbers? Short answer: 100 isn’t large.

The Law of Large Numbers states that for sufficiently large samples, the results look like the expected value (for any reasonable definition of like).

The Fallacy of Large Numbers states that your numbers are sufficiently large.

This doesn’t just apply to expected values. It also applies to looking at a noisy signal and handwaving that the noise will average away with repeated measurements. Before you can say something like that, you need to look at how many measurements, and how much noise, and crank out a lot of calculations. This variant is particularly tricky because you often don’t have numbers on how much noise there is, making it hard to do the calculation. When the calculation is hard, the handwave is more tempting. That doesn’t make it more accurate.

I don’t know of any general tools for saying when statistical approximations become safe. The best thing I know is to spot-check like I did above. Brute-forcing combinatorics sounds scary, but Wolfram Alpha can be your friend (as above). So can python, which has native bignum support. Python has a reputation as being slow for number crunching, but with n<1000 and a modern cpu it usually doesn’t matter.

One warning sign is if your tools were developed in a very different context than where you’re using them. Some approximations were invented for dealing with radioactive decay, where n resembles Avogadro’s Number. Applying these tools to the American population is risky. Some were developed for the American population. Applying them to students in your classroom is risky.

Another danger is that your dataset can shrink. If you’ve validated your tools for your entire dataset, and then thrown out some datapoints and divided the rest along several axes, don’t be surprised if some of your data subsets are now too small for your tools.

This fallacy is related to “assuming events are uncorrelated” and “assuming distributions are normal”. It’s a special case of “choosing statistical tools based on how easy they are to use whether they’re applicable to your use-case or not”.

dspeyer12 Aug 2012 18:39 UTC

30 points

31 comments2 min readLW link Archive

Probability & Statistics Fallacies

Document 13 Aug 2012 1:22 UTC
30 points

Take a moment to think of a number. Take a moment to answer. Take note of what you’re thinking now.

My number, answer and thoughts were “How much do I gain for supplying a car, and how much do I lose for failing to supply an offered car?”.
- novalis 13 Aug 2012 6:37 UTC
  23 points
  Parent
  Yeah, if you’re an airline, the number might be 105.
  - A1987dM 13 Aug 2012 23:52 UTC
    1 point
    Parent
    ?
    - novalis 14 Aug 2012 0:19 UTC
      4 points
      Parent
      Airlines regularly oversell flights—they might sell 105 tickets on a flight with 100 seats. They do that because people frequently don’t show up for a flight.
      
      Come to think of it, I’m not actually sure who does this. I’ve probably flown 100 times, and I can only think of one occasion where I’ve not taken a flight that I bought a ticket for. I guess I’m not the typical airline customer.
      - 15 Aug 2012 0:13 UTC
        2 points
        Parent
        I think it’s business customers who book flights they might not need, because it’s easier to cancel/not show up than to book in a hurry.
        Richard_Kennaway 15 Aug 2012 15:48 UTC
        2 points
        Parent
        A colleague of mine has on more than one occasion booked several different flights for the same journey, to give himself the flexibility to change his arrangements later. It worked out much cheaper than buying a more expensive ticket that allowed for changes.
        Kindly 15 Aug 2012 13:49 UTC
        2 points
        Parent
        For that matter, if you’re 90% sure you’re going to take the flight (which seems reasonable, considering there’s not too many overbooked tickets), you still save money (in expectation) buying the ticket early, since tickets bought far in advance are cheaper.
        
        So maybe it’s the better calibrated customers who book flights they might not need.
- Kaj_Sotala 13 Aug 2012 8:07 UTC
  2 points
  Parent
  Also, how much of a loss can I afford to take before going bankrupt. The lower my cash reserves are, the more I want to play safe.
  - Document 13 Aug 2012 9:20 UTC
    2 points
    Parent
    I was hoping to abstract that away by not specifying “how much money”.
    - Kaj_Sotala 14 Aug 2012 8:18 UTC
      1 point
      Parent
      I figured that you might have intended to include it, but thought that somebody might still benefit from it being pointed out explicitly.
A1987dM 12 Aug 2012 21:13 UTC
24 points

What happened to the Law of Large Numbers? Short answer: 100 isn’t large.

It is. The problem is that 3 isn’t large. Having 10000 cars and 0.03% failure rate would give almost exactly the same probability distribution for the number of cars broken on a given day (namely, the Poisson distribution with lambda=3). Even for N = 20 and p = 15% the Poisson distribution would be a decent approximation.
- DanielLC 13 Aug 2012 4:03 UTC
  0 points
  Parent
  If you have 100 cars and there’s a 50% failure rate, there’d be a standard deviation of five, so you’d need an extra 15 or so cars to be safe. You have to give up almost a third of them. Three not being large is the bigger problem, but 100 still isn’t all that large.
shminux 12 Aug 2012 21:22 UTC
11 points

The Law of Large Numbers states that for sufficiently large samples, the results look like the expected value (for any reasonable definition of like).

No, it does not:

the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.

Note that this talks about “performing the same experiment a large number of times”, which guarantees independence in absence of memory effects. It also talks solely about the sample average, nothing else.

What you probably mean is the Central limit theorem, which assumes no correlation between identically distributed random variables.
- Benquo 12 Aug 2012 22:59 UTC
  4 points
  Parent
  I’m not sure what the distinction is. A sample of size n is the same thing as n trials ( though not necessarily n independent trials).
  - palladias 13 Aug 2012 15:51 UTC
    0 points
    Parent
    Talking about substituting sample size n*p for p trials of size n makes a lot more sense in a coin-flipping context than it does in, say, epidemiology. If I’m doing a cohort study, I need a group of people to all start being tracked at one timepoint (since that’s how I’m going to try to limit confounding).
    
    Although it’s theoretically possible to keep adding additional people to track, the data get awfully messy. I’d rather not add them in bit by bit. I’d prefer two cohort studies with sample size of n to one study that kept adding to the panel and ended up with 3n people.
    - Benquo 13 Aug 2012 17:16 UTC
      1 point
      Parent
      Right, but the mathematical meaning of the word “trial” is a little more general, in the sense that even if you pick the sample all at once, you can consider each member of the sample a “trial”.
Manfred 12 Aug 2012 19:37 UTC
9 points
The usual statistical test is comparing the standard deviation to your measurement precision.

In the car example above, you have a nice binomial distribution, which has a standard deviation of sqrt( N*p(1-p) ).

This is about sqrt(3), which is greater than your measurement precision, which gives you a good idea of what the noise looks like.
- dspeyer 13 Aug 2012 15:33 UTC
  2 points
  Parent
  How exactly do I apply that test?
  - Manfred 13 Aug 2012 16:11 UTC
    2 points
    Parent
    If you want to optimize for some outcome (renting cars at a known average price, with some known average penalty for promising cars you don’t have), you can just directly optimize for it.
    
    But if you just want to get a picture of what’s going on, there aren’t going to be any non-arbitrary tests. Comparing the standard deviation to some scale of interest is just a useful piece of information people use to understand the problem. Feel free to set any arbitrary boundaries (or less arbitrary but still not optimal, e.g. “six sigma” business practices) you want.
NancyLebovitz 13 Aug 2012 9:43 UTC
6 points
I don’t have the book handy, but The Quants (about people who tried applying advanced math to the stock market) mentions that Thorpe (the inventor of card counting) did some work on what percentage of your wealth you can bet safely, and that this was ignored by the younger generation of quants.
- saturn 13 Aug 2012 17:51 UTC
  3 points
  Parent
  That’s the Kelly criterion, equivalent to having logarithmic utility for money.
jsteinhardt 17 Aug 2012 7:33 UTC
0 points
I think the following lectures notes, which I used at SPARC, would be helpful for these sorts of analyses. In particular, the opening section on moment generating functions and the section on the Poisson distribution (although the examples in that section got nuked because the dataset I used was proprietary to Dropbox and I haven’t yet asked for permission to use it beyond SPARC). Apologies for any roughness (if you find typos please let me know).
NancyLebovitz 13 Aug 2012 9:46 UTC
0 points
I don’t have the book handy, but as I recall, The Quants (about people who used advanced math to make investments) mentioned that Thorpe (the inventor of card counting) had math to describe how much of one’s wealth to risk—this was ignored by the younger generation of quants.
[deleted] 12 Aug 2012 22:51 UTC
−3 points
Wolfram|Alpha is rarely anyone’s friend.
- Baughn 13 Aug 2012 9:50 UTC
  9 points
  Parent
  Really? I’ve fond it an incredibly useful tool for quick checks on whatever-we-were-talking-about.
buybuydandavis 13 Aug 2012 3:48 UTC
−13 points
It’s unclear what the issue is here. The LLN doesn’t say how fast the sample average converges to the mean, it just says that it does.

For the rest, I just see a lot of rambling. I’m reminded of the infamous Teen Talk Barbie.
- Kindly 13 Aug 2012 12:19 UTC
  7 points
  Parent
  Fortunately, the LLN isn’t just some black box. It has a proof! And we can look at that proof and get bounds on how quickly the average converges to the mean (which is basically Chebyshev’s inequality, but whatever).
  
  In cases with slightly more regularity to them, we can use Hoeffding’s inequality or something similar and get even better bounds. In fact, this will give results that are almost as good as the assume-it’s-normal strategy, but with the added benefit that you’re actually answering the question you started with, rather than making something up.
  - buybuydandavis 13 Aug 2012 19:50 UTC
    0 points
    Parent
    Yeah, you can get bounds, and they are what they are.
    
    But I grope in vain for a point to this article. The LLN doesn’t converge as fast as he’d like? Yeah, and sometimes gravity is inconvenient for me, but I don’t post my disgruntlement about it to the list. Somehow his expectations of the rate of convergence have been violated. I suggest he do the calculations you suggest, and educate his expectations.
    
    Excepting those lurking in his own expectations, where’s the fallacy? What is he talking about?
    - Kindly 14 Aug 2012 3:03 UTC
      0 points
      Parent
      I think you’re being unnecessarily mean.
      - buybuydandavis 14 Aug 2012 6:40 UTC
        0 points
        Parent
        I do tend toward leaner and meaner over kinder and gentler. I’m trying to be nicer—I actually toned down the second post, and deleted some stuff. Guess not enough for your taste.
        
        But really, do you know what the point is?
        Kindly 14 Aug 2012 15:26 UTC
        0 points
        Parent
        I see it as a special case of “the fallacy of everything related to high school statistics”.
        
        (Okay, so I don’t really agree with the answers the post gives. But I think it’s bringing up an interesting point, and hey, this is only discussion. Possibly if we had lots of high-quality math posts, I would feel differently.)
        
        My personal feeling is that where statisticians go wrong is that they think of their problems, not as something you solve, but something you use tools on. But I’m not sure I can articulate this feeling more precisely than that.