gwern comments on How to Evaluate Data?

gwern 10 Apr 2013 17:40 UTC
1 point
0
My suggestion would be to go via some sort of meta-analysis or meta-meta-analysis (yes, that’s a thing); if you have, for example, a meta-analysis of all results in a particular field and how often they replicate, you can infer pretty accurately how well a new result in that field will replicate. (An example use: ‘So 90% of all the previous results with this sample size or smaller failed to replicate? Welp, time to ignore this new result until it does replicate.’)

It would of course be a ton of work to compile them all, and then any new result you were interested in, you’d still have to know how to code it up in terms of sample size, which sub-sub-field it was in, what the quantitative measures were etc, but at least it doesn’t require nigh-magical AI or NLP—just a great deal of human effort.
- DaFranker 10 Apr 2013 18:52 UTC
  1 point
  0
  Parent
  Nigh-magical is the word indeed. I just realized that if my insane idea in the grandparent were made to work, it could be unleashed upon all research publications ever everywhere for mining data, figures, estimates, etc., and then output a giant belief network of “this is collective-human-science’s current best guess for fact / figure / value / statistic X”.
  
  That does not sound like something that could be achieved by a developer less than google-sized. It also fails all of my incredulity and sanity checks.
  
  (it also sounds like an awesome startup idea, whatever that means)
  - gwern 10 Apr 2013 19:14 UTC
    2 points
    0
    Parent
    Or IBM-sized. But if you confined your ambitions to analyzing just meta-analyses, it would be much more doable. The narrower the domain, the better AI/NLP works, remember. There’s some remarkable examples of what you can do in machine-reading a narrow domain and extracting meaningful scientific data; one of them is ChemicalTagger (demo), reading chemistry papers describing synthesis processes and extracting the process (although it has serious problems getting papers to use). I bet you could get a lot out of reading meta-analyses—there’s a good summary just in the forest plot used in almost every meta-analysis.