gwern comments on How to Evaluate Data?

gwern 12 Apr 2013 15:58 UTC
1 point
0

I don’t see why I should give up just because what I’ve got isn’t convenient to work with. The data is what it is, I want to use it in a Bayesian update of my prior probabilities that the 1995 data is kosher or made up.

Well heck, no one can stop you from intellectual masturbating. Just because it emits nothing anyone else wants to touch is not a reason to avoid doing it.

But you’re working with made up data, the only real data is a high level summary which doesn’t tell you what you want to know, you have no reasonably defined probability distribution, no defensible priors, and you’re working towards justifying a conclusion you reached days ago (this exercise is a perfect example of motivated reasoning: “I dislike this data, and it turns out I am right since some of it was completely made up, and now I’m going to prove I’m extra-right by exhibiting some fancy statistical calculations involving a whole bunch of buried assumptions and choices which justify the already written bottom line”).

My more elaborate procedure is only trying to refine this judgment by taking into account the entire joint probability distribution and trying to “hug the query” as much as possible. With the simulation I can not only pinpoint how astronomically unlikely the coincidence is, but also tell you how much “slop” in categories would be plausible. (If you look for a match within 5% rather than within 1%, then the probability of a coincidence rises to less-than-significant.)

I’ve already pointed out that under a reasonable interpretation of the imaginary data, the observed frequencies are literally the most likely outcome. Would your procedure make any sense if run on, say, lottery tickets?

I don’t have to assume anything at all about the 1995 data (such as how many projects it represents), because as I’ve stated earlier $37B is the entire DoD spend in that year—if the data isn’t made up then it amounts to an exhaustive survey rather than a sampling, and thus the observed frequencies are population frequencies...My reasoning is as follows: assume the costs of the projects are drawn from a normal distribution.

As I said. Assumptions.

Here is a corrected version of the code. I’ve also fixed the SD of the sample, which I miscalculated the first time around.

Although it’s true that even if you make stuff up and choose to interpret things weirdly in order to justify the conclusion, the code should at least do what you wanted it to.
- Morendil 12 Apr 2013 16:51 UTC
  2 points
  0
  Parent
  Do you disagree that the presence in a small sample of two instances of very rare species constitutes strong prima facie evidence against the “coincidence” hypothesis?
  
  I’ve already pointed out that under a reasonable interpretation of the imaginary data, the observed frequencies are literally the most likely outcome. Would your procedure make any sense if run on, say, lottery tickets?
  
  I don’t know what you mean by the above, despite doing my best to understand. My intuition is that “the most likely outcome” is one in which our 9-project sample will contain no project in either of the “very rare” categories, or at best will have a project in one of them. (If you deal me nine poker hands, I do not expect to see three-of-a-kind in two of them.)
  
  I didn’t understand your earlier example using chi-squared, which is what I take you to mean by “already pointed out”. You made up some data, and “proved” that chi-squared failed to reject the null when you asked it about the made-up data. You assumed a sample size of 100, when the implausibility of the coincidence hypothesis comes precisely from the much smaller sample size (plus the existence of “rare” categories and the overall number of categories).
  
  a perfect example of motivated reasoning
  
  I’m experiencing it as the opposite—I already have plenty of reasons to conclude that the 1995 data set doesn’t exist, I’m trying to give it the maximum benefit of doubt by assuming that it does exist and evaluating its fit with the 1979 data purely on probabilistic merits.
  
  (ETA: what I’m saying is, forget the simulation, on which I’m willing to cop to charges of “intellectual masturbation”. Instead, focus on the basic intuition. If I’m wrong about that, then I’m wrong enough that I’m looking forward to having learned something important.)
  
  (ETA2: the fine print on the chi-square test reads “for the chi-square approximation to be valid, the expected frequency should be at least 5”—so in this case the test may not apply.)
  - gwern 12 Apr 2013 18:17 UTC
    0 points
    0
    Parent
    
    Do you disagree that the presence in a small sample of two instances of very rare species constitutes strong prima facie evidence against the “coincidence” hypothesis?
    
    Why is coincidence a live hypothesis here? Surely we might expect there to be some connection—the numbers are ostensibly about the same government in the same country in different time periods. Another example of what I mean by you are making a ton of assumptions and you have not defined what parameters or distributions or sets of models you are working with. This is simply not a well-defined problem so far.
    
    I didn’t understand your earlier example using chi-squared, which is what I take you to mean by “already pointed out”. You made up some data, and “proved” that chi-squared failed to reject the null when you asked it about the made-up data. You assumed a sample size of 100, when the implausibility of the coincidence hypothesis comes precisely from the much smaller sample size (plus the existence of “rare” categories and the overall number of categories).
    
    And as I mentioned, I could do no other because the percentages simply cannot work as frequencies appropriate for any discrete tests with a specific sample of 9. I had to inflate to a sample size of 100 so I could interpret something like 2% as meaning anything at all.
    - Morendil 12 Apr 2013 18:25 UTC
      0 points
      0
      Parent
      
      Why is coincidence a live hypothesis here?
      
      What I mean by “coincidence” is “the 1979 data was obtained by picking at random from the same kind of population as the 1995 data, and the close fit of numbers results from nothing more sinister than a honest sampling procedure”.
      
      You still haven’t answered a direct question I’ve asked three times—I wish you would shit or get off the pot.
      
      (ETA: the 1979 document actually says that the selection wasn’t random: “We identified and analyzed nine cases where software development was contracted for with Federal funds. Some were brought to our attention because they were problem cases.”—so that sample would have been biased toward projects turned “bad”. But this is one of the complications I’m choosing to ignore, because it weighs on the side where my priors already lie—that the 1995 frequencies can’t possibly match the 1979 that closely without the latter being a textual copy of the earlier. I’m trying to be careful that all the assumptions I make, when I find I have to make them, work against the conclusion I suspect is true.)
      - gwern 12 Apr 2013 18:58 UTC
        0 points
        0
        Parent
        
        What I mean by “coincidence” is “the 1979 data was obtained by picking at random from the same kind of population as the 1995 data,
        
        What population is that?
        
        You still haven’t answered a direct question I’ve asked three times—I wish you would shit or get off the pot.
        
        You are not asking meaningful questions, you are not setting up your assumptions clearly. You are asking me, directly, “Is bleen more furfle than blaz, if we assume that quux>baz with a standard deviation of approximately quark and also I haven’t mentioned other assumptions I have made?” Well, I can answer that quite easily: I have no fucking idea, but good luck finding an answer.
        
        While we are complaining about not answering, you have not answered my questions about coin flipping or about lotteries.
        Morendil 12 Apr 2013 20:59 UTC
        0 points
        0
        Parent
        
        you have not answered my questions about coin flipping or about lotteries.
        
        (You didn’t ask a question about coin flipping. The one about lotteries I answered: “I don’t know what you mean”. Just tying up any loose ends that might be interpreted as logical rudeness.)
        Morendil 12 Apr 2013 19:14 UTC
        0 points
        0
        Parent
        
        What population is that?
        
        Answered already—if the 1995 data set exists, then it pretty much has to be a survey of the entire spend of the US Department of Defense on software projects; a census, if you will. (Whether that is plausible or not is a separate question.)
        
        You are not asking meaningful questions
        
        Okay, let me try another one then. Suppose we entered this one into PredictionBook: “At some point before 2020, someone will turn up evidence such as a full-text paper, indicating that the 1995 Jarzombek data set exists, was collected independently of the 1979 GAO data set, and independently found the same frequencies.”
        
        What probability would you assign to that statement?
        
        I’m not trying to set up any assumptions, I’m just trying to assess how plausible the claim is that the 1995 data set genuinely exists, as opposed to its being a memetic copy of the 1979 study. (Independently even of whether this was fraud, plagiarism, a honest mistake, or whatever.)
        gwern 12 Apr 2013 20:40 UTC
        0 points
        0
        Parent
        
        What probability would you assign to that statement?
        
        Very low. You’re the only one that cares, and government archives are vast. I’ve failed to find versions of many papers and citations I’d like to have in the past.