Questions from an imaginary statistical methods exam

Richard_Kennaway4 Feb 2015 13:57 UTC

21 points

Answers to these questions should be expressed numerically, where possible, but no number should be given without a justification for the specific value.

1. Suppose that you have mislaid your house keys, something most people have experienced at one time or another. You look in various places for them: where you remember having them last, places you’ve been recently, places they should be, places they shouldn’t be, places they couldn’t be, places you’ve looked already, and so on. Eventually, you find them and stop looking.

Every time you looked somewhere, you were testing a hypothesis about their location. You may have looked in a hundred places before finding them.

As a piece of scientific research to answer the question “where are my keys?”, this procedure has massive methodological flaws. You tested a hundred hypotheses before finding one that the data supported, ignoring every failed hypothesis. You really wanted each of these hypotheses in turn to be true, and made no attempt to avoid bias. You stopped collecting data the moment a hypothesis was confirmed. When you were running out of ideas to test, you frantically thought up some more. You repeated some failed experiments in the hope of getting a different result. Multiple hypotheses, file drawer effect, motivated cognition, motivated stopping, researcher degrees of freedom, remining of old data: there is hardly a methodological sin you have not committed.

(a) Should these considerations modify your confidence or anyone else’s that you have in fact found your keys? If not, why not, and if so, what correction is required?

(b) Should these considerations affect your subsequent decisions (e.g. to go out, locking the door behind you)?

2. You have a lottery ticket. (Of course, you are far too sensible to ever buy such a thing, but nevertheless suppose that you have one. Maybe it was an unexpected free gift with your groceries.) The lottery is to be drawn later that day, the results available from a web site whose brief URL is printed on the ticket. You calculate a chance of about 1 in 100 million of a prize worth getting excited about.

(a) Once the lottery results are out, do you check your ticket? Why, or why not?

(b) Suppose that you do, and it appears that you have won a very large sum of money. But you remember that the prior chance of this happening was 1 in 100 million. How confident are you at this point that you have won? What alternative hypotheses are also raised to your attention by the experience of observing the coincidence of the numbers on your ticket and the numbers on the lottery web site?

(c) Suppose that you go through the steps of contacting the lottery organisers to make a claim, having them verify the ticket, collecting the prize, seeing your own bank confirm the deposit, and using the money in whatever way you think best. At what point, if any, do you become confident that you really did win the lottery? If never, what alternative hypotheses are you still seriously entertaining, to the extent of acting differently on account of them?

Richard_Kennaway4 Feb 2015 13:57 UTC

21 points

25 comments2 min readLW link Archive

gjm 4 Feb 2015 14:28 UTC
11 points
1 (a and b) Not to speak of. The key difference is that the event of finding your keys is quite different from that of (say) getting a significant result in a medical trial. When I have what looks to me exactly like my key ring in my hand, with all the usual things on it, the possible failure modes (hallucination, someone having maliciously planted a near-duplicate in my house, …) are both rarer and weirder than when I have just found that 240 of the 400 patients in the treatment group got better, versus only 215 of the 400 in the control group. In the latter case there’s a 1% chance (or whatever) of getting the result merely by chance, and if the experiment is poorly designed in other ways there are plenty of other demonstrably-high-probability ways to get misleadingly good-looking results.

All those methodological “errors” might reduce the odds that I’ve really found my keys by, let’s generously say, a factor of 100. But once I have (as it seems to me) the keys in my hand, unless I’m knowingly impaired by drink or drugs or something, that’s probably at least 1000000:1 evidence that I’ve found them. So even a 100:1 discount, which would kill most published experimental findings, would leave me something like 99.99% confident of actually having my keys.

Other relevant differences, more relevant to (b) than to (a) and perhaps also relevant to whatever point you’re trying to make: once I’ve got my keys I will then try to use them and thereby rapidly get more evidence for (or, improbably, against) the correctness of my experimental result; the consequences of an error are rather minor compared with (e.g.) those of thinking a treatment cures cancer when it really doesn’t; bringing in some impartial other person to replicate my search is a bigger increment of relative effort than arranging a replication of a typical scientific experiment.

2 (a) Yes. A typical prize with that sort of probability might be on the order of $10M, for an expected benefit of 10c from checking the ticket. It takes only a few seconds to check. (b) First I would double-check it, because the single highest-probability way to be wrong is self-deception. There are other hypotheses but empirically they seem to be improbable: e.g., I would expect a non-negligible fraction of cases in which someone is convincingly hoaxed into thinking that they have a majorly-winning lottery ticket to get publicized (because it seems like it makes quite a good story). So I might be, I dunno, 80% confident of having actually won at this point, which is plenty enough to justify attempting to claim. Note that the relevant probability to compare against is not “someone carried out a successful lottery hoax” (or whatever) but “someone carried out a successful lottery hoax on this occasion with me as target”, which is much lower in the same way as the probability of “I just won the lottery” is lower than that of “someone won a lottery once”. (c) I think I would define “really did win the lottery” so that once I’ve had my claim checked and been paid there’s no further question (barring extreme options like having hallucinated the whole thing). Until that point I would still entertain the possibility that some kind of hoax or error is at work, though after getting confirmation from the organizers I don’t think this would make a big difference to my actions beyond encouraging me to check the subsequent steps in the process.

( I know you said you want numerical answers to everything, with good justification for them all, but you haven’t provided any reason why I should actually put in the effort to do that and I have chosen not to.)
Luke_A_Somers 4 Feb 2015 14:35 UTC
10 points
The last paragraph of 1 before the questions is just wrong all over.

You tested a hundred hypotheses before finding one that the data supported, ignoring every failed hypothesis.

Yes to the first part, but there’s nothing methodologically wrong with that. No to the second part. What is the alternative to all of these hypotheses—that you don’t actually have keys at all?

You really wanted each of these hypotheses in turn to be true, and made no attempt to avoid bias.

What sort of bias do you mean? The chances that you would conclude that you had actually found your keys when you had not is roughly zero.

You stopped collecting data the moment a hypothesis was confirmed.

Once you have your keys in your hand, you have strictly falsified all of the other hypotheses.

When you were running out of ideas to test, you frantically thought up some more.

… yes, and how is this methodologically unsound? When you have no explanations, it is time to invent some.

You repeated some failed experiments in the hope of getting a different result.

False negative rate is low but nonzero. False positive rate is multiple orders of magnitude lower, as noted. Repeat measurements are legit.

Basically, a vanishingly small false positive rate renders a lot of methodological ‘sins’ irrelevant.

~~~~

On 2a, sure. If the odds are 1 in a hundred million, the pot is probably decent (you didn’t specify, so I can’t estimate), and checking is cheap.

2b/c, A match between the numbers on the ticket and the numbers in the drawing is no more or less likely than the actual odds of winning. That doesn’t mean I won’t check carefully, but a match is very strong evidence. I would basically be confident at this point, though cognizant of the distant possibility of a screwup or fraud. This would be hard to quantify, since I do not have data on the subject.
- solipsist 5 Feb 2015 0:29 UTC
  0 points
  Parent
  
  I would basically be confident at this point, though cognizant of the distant possibility of a screwup or fraud.
  
  Clearly people you know don’t take pranking seriously enough. Your friend sitting quietly at their laptop might have overheard your conversation and deployed a mirror website with fraudulent information to make your day more surreal (as I have on more than one occasion). Faking a lottery ticket would be too cruel, but I would totally fake a New York Times article “’Obama Calls on Somers to Stop F*#ing Around and Finish His Thesis Already”.
  
  edit: This comment is not on topic, but for some reason I can’t delete it
  - Luke_A_Somers 5 Feb 2015 1:27 UTC
    0 points
    Parent
    
    Clearly people you know don’t take pranking seriously enough.
    
    yes, actually. It’s a bit disappointing. Just a little.
- Ben Pace 4 Feb 2015 22:29 UTC
  −1 points
  Parent
  Seconding this comment; I will just further add that
  
  You repeated some failed experiments in the hope of getting a different result.
  
  Is known as ‘repeatability’, and is normally considered virtuous to do.
  - Luke_A_Somers 5 Feb 2015 1:32 UTC
    1 point
    Parent
    It’s usually more ‘virtuous’ to replicate the positive (i.e. information-laden, surprising) results, while repeating negative results seems more like fishing for better p-values.
gwillen 4 Feb 2015 16:30 UTC
7 points
I think I might summarize the lesson here (at least from part 1) as “strong enough evidence covereth a multitude of sins”, but I’d like to hear the poster’s thoughts as to what the lesson is meant to be.
Anders_H 5 Feb 2015 5:55 UTC
4 points
Question 1:

The problem you are interested in is about the validity of a single data point, which is not what statistical inference is meant to answer. Statistical hypotheses are about population (distribution) level parameters, not about individual observations.

Imagine I sample 100 individuals. Using a diagnostic test with 100% sensitivity and specificity, I find out that 20 of them have cancer, including one individual named Joe. If I make the claim that 20% of the general population have cancer, you can meaningfully ask me how certain I am about this claim. This is what statistics will allow me to formalize. However, if you ask me how certain I am that Joe has cancer, I will tell you that I am 100% certain (because the test was foolproof). This is not a statistical question anymore, the problem is not sampling variability.

The same holds for the case of my keys. What you are interested in is a single data point, ie whether I accurately concluded that the object found in location X were indeed my keys. In order to answer that, you need to reason about measurement error (sensitivity/specificity), not sampling variability. In this case, there is no reason to suspect measurement error. I will believe I have found my keys.

Question 2:

This is just a case where once you have collected your winnings, you have overwhelming evidence that you actually won the lottery: The probability of collecting your winnings if you did not win approaches zero, so the likelihood ratio will easily overpower 1 in 100 million.
- Lumifer 5 Feb 2015 7:30 UTC
  0 points
  Parent
  
  Statistical hypotheses are about population (distribution) level parameters
  
  That’s an unnecessarily narrow (and entirely frequentist) approach.
  
  Statistics is a toolbox for dealing with uncertainty.
  - Anders_H 5 Feb 2015 7:54 UTC
    0 points
    Parent
    I was responding to the original post, which said:
    
    You repeated some failed experiments in the hope of getting a different result. Multiple hypotheses, file drawer effect, motivated cognition, motivated stopping, researcher degrees of freedom, remining of old data: there is hardly a methodological sin you have not committed.
    
    I realize my wording may have been suboptimal, but some of these biases (such as multiple comparisons) only make sense in a frequentist framework.
    
    I was trying to explain why some of these methodological problems do not even apply in this example. It is not a matter of other evidence is strong enough to outweight the methodological flaws. These methodological flaws are irrelevant to questions about the individuals data points.
    
    For example, problems arising from biased stopping rules would arise if you were trying to estimate the proportion of all locations that contain keys that open your door. However, a biased stopping rule makes absolutely no difference for the integrity of the individual data points.
Richard_Kennaway 4 Feb 2015 22:17 UTC
3 points
As several people have asked about my intentions in posing these problems, I’ll answer here.

What I was interested in was seeing how people deal with extreme probabilities.

Some people have in the past expressed the view on LW that it is not humanly possible to be justifiably 80 decibans sure of anything. You would have to able to be right about it with an error rate of no more than 1 in 100 million. Who can be right that often about anything? Surely, some would say, it must remain more likely that you’re dreaming, or hypnotised, or being trolled by the Matrix Lords, or something else that you haven’t even thought of, for who can scour out every last hundred millionth of possibility space? And yet, ordinary people, who have never learned to believe that it is impossible, have no difficulty in collecting the Euromillions jackpot, which has approximately those odds against. If they are as sure afterwards that they have won as they would have been sure before that they would not, that’s a swing of 160 decibans.

BTW, that was the lottery I had in mind in composing the example, and is not a fly-by-night operation. I might have sharpened the example by adding that. Someone wins the Euromillions jackpot every few weeks, for a prize of 10 to above 100 million pounds, depending on how many weeks it has rolled over.

The current consensus in the comments, though, is that the evidence of the house keys is strong enough that the posterior certainty that I have them is not perceptibly swayed by methodological flaws gross enough to completely discredit any paper that relied on statistical techniques to support its claims, and that I can be justifiably sure I have won the lottery at least by the time my bank confirms receipt of the money. These are my own views too.

“0 and 1 are not probabilities”, people still say here from time to time, yet a lot of everyday life runs well enough on 0s and 1s.
- gwillen 6 Feb 2015 1:38 UTC
  0 points
  Parent
  I think a lot of probabilistic and behavioral reasoning starts to break down and act strangely in the presence of very large odds ratios.
  
  For example, if I discover that I have won the lottery, how should I estimate the probability that I am hallucinating, or dreaming, or insane? In the first case, I cannot trust the evidence of my senses, but I can still reason about that evidence, so I should at least be able to work out a P(hallucination). In the second case, my memory and reasoning faculties are probably significantly impaired, BUT any actions I take will actually have no effect on the world, so I should consider this case when computing questions about truth, but IGNORE it when computing questions about action. In the third case, it’s likely that I can’t even reason coherently, so it’s not clear how to weigh this state at all. Conditional on being in it, my reasoning is questionable; conditional on my being able to reason about probabilities, I’m very likely (how likely?) not in it; therefore when reasoning about how to behave, I should probably discount it by what seems to be a sort of anthropic reasoning.
  
  So whatever the probabilities are that I can’t trust my senses / that I can’t trust my own reasoning abilities, it’s going to be very hard for me to reason directly about probabilities more extreme than that in many cases.
Alsadius 8 Feb 2015 1:40 UTC
2 points
2a) Yes, because lottery tickets have a large number of smaller prizes, and there’s a non-trivial chance I won one of those.

b) Aside from natural human disbelief, fairly high. The only alternative hypotheses that come to mind(assuming it’s a reputable lottery and I’m sure I’m on the right site) are that they mis-posted the numbers, but there’s no good reason to believe they’d have picked mine there, and their posting error rate can be estimated at perhaps 0.1%, so I’d be 99.9% confident if that’s the only alternative. Some sort of smartass scam is possible, but really unlikely. (I’d double-check the dates, of course)

c) Ticket verification, again assuming it’s a reputable lottery. Money in the bank if it’s a fly-by-night operation.
Houshalter 6 Feb 2015 12:10 UTC
2 points

there is hardly a methodological sin you have not committed

The “methodological sins” are intended to prevent bias from influencing very noisy data. In many experiments it’s quite possible to get a positive result by random chance. I don’t think it’s possible to think you found your keys, without having actually found them.

Whenever you can verify a positive result with certainty, then the search procedure doesn’t matter. Just use whatever method is the fastest. You only need to be careful when the result isn’t certain. When the bias of the search procedure can influence the outcome.
- fubarobfusco 6 Feb 2015 17:03 UTC
  4 points
  Parent
  
  I don’t think it’s possible to think you found your keys, without having actually found them.
  
  Sure it is.
  
  You may have found some other keys you had in your house that are not the ones you were looking for, and mistaken them for the right keys on sight.
  
  You may have put some object in your pocket, and now believe that it was your keys, when in fact you’re remembering the incident of putting your keys in your pocket yesterday. The object in your pocket today is your phone, which you usually put in your backpack.
  
  You may be schizophrenic and suffering a hallucination. The business man’s job is giving the business. You are not the business man.
  
  Your keys may have been missing due to the action of an intelligent agent, who has also substituted false keys for them. For instance, your child or roommate may be playing a prank on you.
  
  You may have become distracted by some other pressing goal (say, swatting a mosquito that is about to bite you) and failed to continue the search for your keys. Then, upon absent-minded reflection (what was I doing? why did I come into this room?) you rationalize that you stopped looking for the keys because you found them.
  
  Human brains are not just biased; they’re also glitchy.
  
  (The doubt “Maybe these are not my keys, maybe they are X” is canceled by “maybe these are not X, maybe they are my keys” and the test is to go try to lock the door with them. You can’t lock the door with your phone.)
NoSignalNoNoise 5 Feb 2015 4:29 UTC
2 points
1.

(a) No, they should not modify anyone’s confidence that you have found your keys. The procedure for determining whether you are holding your keys is sufficiently reliable that there is no need to doubt its results. What the factors mentioned should affect is your (and others’) confidence in the efficacy of the procedure you used to find them. If you made 100 attempts to find your keys and 1 succeeded, this is weak evidence at best that the attempt that happened to succeed had a good a priori reason to.

(b) No.
CronoDAS 5 Feb 2015 2:29 UTC
2 points
Follow-up question:

1) You’ve flipped a (fair) coin 100 times and recorded the results. The odds of you having gotten that exact sequence is 1 in 2^100, or about one in 10^30. Things that improbable just don’t happen, so if you tell me your sequence, why should I believe you?
NoSignalNoNoise 5 Feb 2015 4:43 UTC
1 point
2.

(a) Assuming you are perfectly rational and that money has logarithmic VNM utility, checking the lottery ticket is not worth the time spent. However, System 1 doesn’t understand very low probabilities, so it will probably distract you by wondering whether you won. Given bounded rationality, it’s probably worthwhile to check just to make the distraction go away.

(b) There is a ~1 in 1000 chance that I would read the numbers wrong. If misreading the numbers and actually having won are the only possibilities, there is about a 1 in 100k chance that I’ve won. Depending on the specific circumstances of how I came to posses the ticket and how legitimate it and the website look, I would also entertain the possibility that it’s a scam or a prank.

(c) I would expect banks to be pretty careful about confirming large deposits, so I would be pretty confident that I had won (p > 0.5) when the deposit was confirmed. A few weeks later if the bank had not reported any issues with the deposit, I would be very confident (p > 0.99) that I had won.

The prior improbability of this whole scenario would cause me to update in favor of the simulation hypothesis, because a simulator would probably have a greater propensity to simulate people receiving large windfalls than a natural world is to generate them.

I consider the hypothesis “I am living in a simulation where I just won an extremely large amount of money” to be a subset of “I just won an extremely large amount of money”.
Epictetus 5 Feb 2015 16:33 UTC
0 points
1: I submit that I merely tested one hypothesis—that the keys were in my house—and sampled with replacement. While non-optimal, the use of a probabilistic search algorithm has precedent, e.g. in paleontology and astronomy.

(a) No less confidence than one would have in the theory of evolution. The manner of finding support among the fossil record consists of excavating likely locations, and in a like manner I have performed a search of the likely location for my keys, being my domicile, and proceeded until all locations were exhausted.

(b) The uncertainty of my keys being in my pocket is no greater than the uncertainty in the door being locked (as my door only has a sliding bolt that requires a key to operate). If the probability of my keys not being in my pocket is a (being less than or equal to the probability of the door failing to lock), then the probability of failing to open the door is a(1-a) = a—a^2. This reaches its maximum value at a = ¹⁄₂, which still leaves me with a ³⁄₄ chance of entering my house.

2: (a) Naturally. As my Internet use generally has a negative value, anything with a positive expectation can only be a benefit.

(b) We wish to compute the probability of winning given the event that I observed the winning numbers on the lottery website. Suppose, then, that a given observation of the winning lottery numbers has a 95% chance of being correct. This can be found by a trivial application of Bayes’ theorem, obtaining 19 in 1,000,000, which is roughly one part in 50 thousand. The question, then, becomes: how many independent observations would it take for the expected probability of winning make it worth submitting a claim to the lottery office? I leave this as an exercise.

Alternative hypotheses include: misreading the page, looking at the wrong drawings, erroneous information on the page, and pranks by some rogue.

(c) Once I’m able to spend the money, I expect to be far too intoxicated to ponder the metaphysics of the matter.
Ishaan 4 Feb 2015 18:27 UTC
0 points

(a) Should these considerations modify your confidence or anyone else’s that you have in fact found your keys? If not, why not, and if so, what correction is required?

a) Yes—these considerations should have an impact on your confidence. However, this being a evolutionary familiar scenario, your intuition has already made the necessary adjustment such as correcting for multiple comparisons and all the other hyper-complex issues, so don’t whip out a pencil and “correct” yourself..

b) Yes.It is not always negligible. For example, if you were searching an entire apartment complex with identical doors for your keys and you found one in the hall you’d check the key against the lock first to see if it were yours before locking yourself out. Or, you might ask your roommate if they had their keys, to ensure you didn’t accidentally steal it. The considerations can alter your certainty.

Once the lottery results are out, do you check your ticket? Why, or why not?

a) I am weighing opportunity cost against potential winnings and you did not say how much the prize was or where I got the ticket from or what the URL was and if I recognized the domain name. (In practice, I don’t actually check the bottle cap codes on soda or follow up on coupons making similar offers.)

b) My intuition is automatically factoring in estimates for pranks, scams, matched tickets made in error which were not honored, and so on. If a smarmy guy in a suit standing around handing out URLs promising money, or if cameras are hiding in the bushes...

c) See (b)....What’s the lesson here? I don’t think I got it. Mine would be “we automatically do all this stuff in non-abstract scenarios”.
shminux 4 Feb 2015 16:07 UTC
0 points
To the commenters who think that these are trivial questions: consider slight modifications. For example, what if it’s a car key you are looking for? What if your friend has a set of keys just like your own? What if your roommate is a known prankster and this is Apr 1? What if you have recently been had or almost had by a lottery scam? And so on.
- Luke_A_Somers 4 Feb 2015 18:13 UTC
  0 points
  Parent
  In those cases I would need to check that they are the right keys after finding them. Everything else (part 1) remains the same.
  
  For the lottery scam, well, I was assuming I had heard of the lottery. This is very likely since IIRC the only legal lotteries here are run by the state. The exact answer depends on too many contingent details. More importantly, it does not strongly affect my actions—I am going to be careful as anything with receiving that money—I’d visit the bank and make a new account to put it in, say, even if I’m 90% confident that it’s legit.
[deleted] 4 Feb 2015 14:49 UTC
0 points
I like this post but am a little curious why you posed this as an imaginary statistics exam… perhaps because I am in the midst of preparing some non-imaginary exam questions related to making inferences from data at the moment. Pushing the exam theme a little further, I am curious about how you might evaluate the answers, were this a non-imaginary exam.
ike 4 Feb 2015 14:37 UTC
0 points

(a) Should these considerations modify your confidence or anyone else’s that you have in fact found your keys? If not, why not, and if so, what correction is required?

No, the evidence that the keys were found is strong enough to outweigh the problems in the searching method.

(b) Should these considerations affect your subsequent decisions (e.g. to go out, locking the door behind you)?

No.

(a) Once the lottery results are out, do you check your ticket? Why, or why not?

Personally, I’d probably check it for fun. It would take less than a minute, and provides enough amusement to be worth the time. I could easily see myself not checking if I forgot, though.

(b) Suppose that you do, and it appears that you have won a very large sum of money. But you remember that the prior chance of this happening was 1 in 100 million. How confident are you at this point that you have won? What alternative hypotheses are also raised to your attention by the experience of observing the coincidence of the numbers on your ticket and the numbers on the lottery web site?

Very confident, let’s say >99.9%.

No other hypotheses are raised to my attention. You could ask for hypotheses conditioning on me not really winning, but none of those are likely enough to outweigh the ¹⁄₁₀₀ million chance of actually winning.

Perhaps I’d check the numbers a few other times, using proxies to rule out the “hacked” hypotheses, but that’s only because that’s easy to check, not because I would take it seriously.

(c) Suppose that you go through the steps of contacting the lottery organisers to make a claim, having them verify the ticket, collecting the prize, seeing your own bank confirm the deposit, and using the money in whatever way you think best. At what point, if any, do you become confident that you really did win the lottery? If never, what alternative hypotheses are you still seriously entertaining, to the extent of acting differently on account of them?

The point I become confident is after I check the numbers.

Edit: after reading devas’s comment: I had assumed that the lottery ticket was state-run and well-known. If not, I would be a lot less confident (in fact, I’ve already won dozens of these lotteries by virtue of having an email account! Aren’t those people who randomly choose email accounts to win money so nice?)
devas 4 Feb 2015 14:10 UTC
0 points
2) a) I check the ticket, assuming I have nothing better to do and that I remember it. To be more precise, if there is a family emergency and I have to drive to the hospital for whatever reason, I will not go out of my way to jury-rig an internet connection and I won’t look for the ticket before going out. I check the ticket because even a one in a million chance of free money is still free money. b)I am not very confident; I’m not sure, but a grossly inaccurate measure of how confident I would be is that I’d think there is a ¹⁄₁₀ chance of me having won. Other alternative hypotheses are the lottery site being a phishing trap, the site being nothing more than advertisement, additional requirements which would make the collection of the prize impossible (you need the lottery ticket, the receipt, it must be collected yesterday and it can only be deposited in the Monte dei Paschi di Siena). c)Between collecting the prize and seeing the bank confirm the deposit, depending on what other additional information I’ve seen (skeeviness of the lottery operators, state of the office where I collected the prize, and so on).