[Link] IBM to set Watson loose on cancer genome data

http://​​arstechnica.com/​​science/​​2014/​​03/​​ibm-to-set-watson-loose-on-cancer-genome-data/​​

Can anyone more informed about Watson or Machine Learning in general comment on this application? Specifically, I’m interested in an explanation of this part:

[...] the National Institutes of Health has compiled lists of biochemical pathways—signaling networks and protein interactions—and placed them in machine-readable formats. Once those were imported, Watson’s text analysis abilities were set loose on the NIH’s PubMed database, which contains abstracts of nearly every paper published in peer-reviewed biomedical journals.

Over time, Watson will develop its own sense of what sources it looks at are consistently reliable. Royyuru told Ars that, if the team decides to, it can start adding the full text of articles and branch out to other information sources. Between the known pathways and the scientific literature, however, IBM seems to think that Watson has a good grip on what typically goes on inside cells.

It sounds like Watson will be trained through some standard formatted input data and then it’s going to read plaintext articles and draw conclusions about them? It sounds like they’re anticipating that Watson will be able to tell which studies are “good” studies as well, which sounds incredible (in both senses of the word).