Beyond Statistics 101

Is statis­tics be­yond in­tro­duc­tory statis­tics im­por­tant for gen­eral rea­son­ing?

Ideas such as re­gres­sion to the mean, that cor­re­la­tion does not im­ply cau­sa­tion and base rate fal­lacy are very im­por­tant for rea­son­ing about the world in gen­eral. One gets these from a deep un­der­stand­ing of statis­tics 101, and the ba­sics of the Bayesian statis­ti­cal paradigm. Up un­til one year ago, I was un­der the im­pres­sion that more ad­vanced statis­tics is tech­ni­cal elab­o­ra­tion that doesn’t offer ma­jor ad­di­tional in­sights into think­ing about the world in gen­eral.

Noth­ing could be fur­ther from the truth: ideas from ad­vanced statis­tics are es­sen­tial for rea­son­ing about the world, even on a day-to-day level. In hind­sight my prior be­lief seems very naive – as far as I can tell, my only rea­son for hold­ing it is that I hadn’t heard any­one say oth­er­wise. But I hadn’t ac­tu­ally looked ad­vanced statis­tics to see whether or not my im­pres­sion was jus­tified :D.

Since then, I’ve learned some ad­vanced statis­tics and ma­chine learn­ing, and the ideas that I’ve learned have rad­i­cally al­tered my wor­ld­view. The “offi­cial” pre­req­ui­sites for this ma­te­rial are calcu­lus, differ­en­tial mul­ti­vari­able calcu­lus, and lin­ear alge­bra. But one doesn’t ac­tu­ally need to have de­tailed knowl­edge of these to un­der­stand ideas from ad­vanced statis­tics well enough to benefit from them. The prob­lem is ped­a­gog­i­cal: I need to figure out how how to com­mu­ni­cate them in an ac­cessible way.

Ad­vanced statis­tics en­ables one to reach nonob­vi­ous conclusions

To give a bird’s eye view of the per­spec­tive that I’ve ar­rived at, in prac­tice, the ideas from “ba­sic” statis­tics are gen­er­ally use­ful pri­mar­ily for dis­prov­ing hy­pothe­ses. This pushes in the di­rec­tion of a state of rad­i­cal ag­nos­ti­cism: the idea that one can’t re­ally know any­thing for sure about lots of im­por­tant ques­tions. More ad­vanced statis­tics en­ables one to be­come jus­tifi­ably con­fi­dent in nonob­vi­ous con­clu­sions, of­ten even in the ab­sence of for­mal ev­i­dence com­ing from the stan­dard sci­en­tific prac­tice.

IQ re­search and PCA as a case study

In the early 20th cen­tury, the psy­chol­o­gist and statis­ti­cian Charles Spear­man dis­cov­ered the the g-fac­tor, which is what IQ tests are de­signed to mea­sure. The g-fac­tor is one of the most pow­er­ful con­structs that’s come out of psy­chol­ogy re­search. There are many fac­tors that played a role in en­abling Bill Gates abil­ity to save per­haps mil­lions of lives, but one of the most salient fac­tors is his IQ be­ing in the top ~1% of his class at Har­vard. IQ re­search helped the Gates Foun­da­tion to rec­og­nize io­dine sup­ple­men­ta­tion as a nu­tri­tional in­ter­ven­tion that would im­prove so­cioe­co­nomic prospects for chil­dren in the de­vel­op­ing world.

The work of Spear­man and his suc­ces­sors on IQ con­sti­tute one of the pin­na­cles of achieve­ment in the so­cial sci­ences. But while Spear­man’s dis­cov­ery of IQ was a great dis­cov­ery, it wasn’t his great­est dis­cov­ery. His great­est dis­cov­ery was a dis­cov­ery about how to do so­cial sci­ence re­search. He pi­o­neered the use of fac­tor anal­y­sis, a close rel­a­tive of prin­ci­pal com­po­nent anal­y­sis (PCA).

The philos­o­phy of di­men­sion­al­ity reduction

PCA is a di­men­sion­al­ity re­duc­tion method. Real world data of­ten has the sur­pris­ing prop­erty of “di­men­sion­al­ity re­duc­tion”: a small num­ber of la­tent vari­ables ex­plain a large frac­tion of the var­i­ance in data.

This is re­lated to the effec­tive­ness of Oc­cam’s ra­zor: it turns out to be pos­si­ble to de­scribe a sur­pris­ingly large amount of what we see around us in terms of a small num­ber of vari­ables. Only, the vari­ables that ex­plain a lot usu­ally aren’t the vari­ables that are im­me­di­ately visi­blein­stead they’re hid­den from us, and in or­der to model re­al­ity, we need to dis­cover them, which is the func­tion that PCA serves. The small num­ber of vari­ables that drive a large frac­tion of var­i­ance in data can be thought of as a sort of “back­bone” of the data. That en­ables one to un­der­stand the data at a “macro /​ big pic­ture /​ struc­tural” level.

This is a very long story that will take a long time to flesh out, and do­ing so is one of my main goals.