(I also tried Excel, but the dataset was too large to load everything in. In retrospect, I realize I could have just loaded in the first million rows − 2⁄3 of the dataset, more than enough to get statistically significant results from—and analyzed that, possibly keeping the remaining ~400k rows as a testing set.)
I used the python package Pandas.
(I also tried Excel, but the dataset was too large to load everything in. In retrospect, I realize I could have just loaded in the first million rows − 2⁄3 of the dataset, more than enough to get statistically significant results from—and analyzed that, possibly keeping the remaining ~400k rows as a testing set.)