Punoxysm comments on Request for suggestions: ageing and data-mining

Punoxysm 26 Nov 2014 15:26 UTC
3 points
Would anyone want to literally do this on something as complex as patient data?

If not, why not just say try to come up with as good of models as you can?

Pick a couple of quantities of interest and try to model them as accurately as you can.
- Daniel_Burfoot 26 Nov 2014 16:55 UTC
  3 points
  Parent
  There is a problem that some data may really fundamentally be a distraction, and so modeling it is just a waste of time.
  
  But it is very hard to tell ahead of time whether or not a piece of data is going to be relevant to a downstream analysis. As an example, in my work on text analysis, the issue of capitalization takes a lot of effort in proportion to how interesting it seems. It is tempting to just throw away caps information by lowercasing everything. But capitalization actually has clues that are relevant to parsing and other analysis—in particular, it allows you to identify acronyms, which usually stand for proper nouns.