R support group and the benefits of applied statistics

Following the interest in this proposal a couple of weeks ago, I’ve set up a Google Group for the purpose of giving people a venue to discuss R, talk about their projects, seek advice, share resources, and provide a social motivator to hone their skills. Having done this, I’d now like to bullet-point a few reasons for learning applied statistical skills in general, and R in particular:

The General Case:

- Statistics seems to be a subject where it’s easy to delude yourself into thinking you know a lot about it. This is visibly apparent on Less Wrong. Although there are many subject experts on here, there are also a lot of people making bold pronouncements about Bayesian inference who wouldn’t recognise a beta distribution if it sat on them. Don’t be that person! It’s hard to fool yourself into thinking you know something when you have to practically apply it.

- Whenever you think “I wonder what kind of relationship exists between [x] and [y]”, it’s within your power to investigate this.

- Statistics has a rich conceptual vocabulary for reasoning about how observations generalise, and how useful those generalisations might be when making inferences about future observations. These are the sorts of skills we want to be practising as aspiring rationalists.

- Scientific literature becomes a lot more readable when you appreciate the methods behind them. You’ll have a much greater understanding of scientific findings if you appreciate what the finding means in the context of statistical inference, rather than going off whatever paraphrased upshot is given in the abstract.

- Statistical techniques make use of fundamental mathematical methods in an applicable way. If you’re learning linear algebra, for example, and you want an intuitive understanding of eigenvectors, you could do a lot worse than learning about principal component analysis.

R in particular:

- It’s non-proprietary, (read “free”). Many competitive products are ridiculously expensive to license.

- Since it’s common in academia, newer or more exotic statistical tools and procedures are more likely to have been implemented and made available in R than proprietary statistical packages or other software libraries.

- R skills are a strong signal of technical competence that will distinguish you from SPSS mouse-jockeys.

- There are many out-of-the-box packages for carrying out statistical procedures that you’d probably have to cobble together yourself if you were working in Python or Java.

- Having said that, popular languages such as Python and Java have libraries for interfacing with R.

- There’s a discussion /​ support group for R with Less Wrong users in it. :-)