btrettel comments on Mistakes repository

btrettel 1 Nov 2013 21:27 UTC
0 points
0
Two mistakes from when I was figuring out where to go for my PhD:
- Not applying to more PhD programs. I limited myself to four because (a) I only wanted to apply to strong programs in my field, (b) I was very picky about location, and (c) I was too confident. Ultimately, this worked out okay, but I now regard the decision as arrogant and risky.
- Not validating code and data used to decide which PhD programs to apply to and which PhD program to choose. As part of these decisions, I collected data about many things. For one of the more important things, I wrote a web scraper to collect data and processing scripts to distill the data into something useful. In this case, the result was a table ranking cities in a particular quantity. Unfortunately, some data corruption combined with sloppy coding seriously skewed my results. I did not notice this problem until well past when I made my decision. After restoring a backup of the data and writing a validation script to check the data’s quality, I noticed that the city I selected actually was among the worst in this particular quantity, not acceptably worse than average as I had thought before. Oops. Software Carpentry has some good notes about how to handle data like this. (This point has intentionally been written abstractly because what the data represents is not important.)