gwern comments on 2012 Survey Results

gwern 29 Nov 2012 19:37 UTC
2 points
By the way, has anyone figured out how to load the CSV in R?read.table chokes with problems like “Error in read.table(“for_public.csv”, header = TRUE, quote = “”, sep = ”,”) : more columns than column names”; dos2unix doesn’t help, and loading it up in OpenOffice, it looks fine but re-exporting as CSV leads to the same darn errors. Mucking about, I can get it to load if I delete everything but the first entry and then the last 4 commas but this solution doesn’t work for any additional entries!
- EvelynM 29 Nov 2012 19:43 UTC
  9 points
  Parent
  This worked “dat ← read.csv(‘http://raikoth.net/Stuff/LessWrong/for_public.csv’)”
  - gwern 29 Nov 2012 20:02 UTC
    0 points
    Parent
    That works, thanks.
- ChristianKl 29 Nov 2012 21:03 UTC
  4 points
  Parent
  Another R issue. How do I convert the scores from IQTest into real numbers? as.numeric seems to do strange things.
  - Matt_Simpson 29 Nov 2012 21:43 UTC
    10 points
    Parent
    use “as.numeric(as.character(dat$IQTest))”
    
    The IQtest data is stored as factor. A factor variable has a set of levels, numbered 1,2,3,… that are the variable can possibly take on and labels for those factors. as.numeric(X) returns the level numbers of X. as.character returns the labels of X. In the case that the labels are actually numbers (usually integers that R is interpreting as character labels for some reason), as.numeric(as.character(X)) will return the numeric values that R is interpreting as labels.
    
    EDIT:
    
    In this case, when no value for IQtest was reported, it was stored as ” ” instead of “”, which made R think the variable contained character data which R defaults to treating as factors. The ” “’s should all be NA’s once it’s converted properly.
  - EvelynM 29 Nov 2012 21:23 UTC
    0 points
    Parent
    “iqt ← as.numeric(dat$IQTest)” The already numeric IQ is in dat$IQ, iqt is only the suspect, online IQtest.
    - Douglas_Knight 2 Dec 2012 5:36 UTC
      1 point
      Parent
      A serious design flaw in S/R means that your code fails badly and silently. The man page suggests that you try as.numeric(factor(5:10)).
      
      Matt Simpson’s code is correct. The man page suggests slightly faster as.numeric(levels(dat$IQTest))[dat$IQTest]. That example also shows why they chose the the coercion that they did.
- Douglas_Knight 29 Nov 2012 21:59 UTC
  2 points
  Parent
  If you really did quote=”“, then you don’t have any quote character and it won’t work. But that’s probably some kind of markdown error. The default in read.table is to allow both double and single quotes, while the default in read.csv is to only allow single quotes; I find that if I change your argument to quote=”\”” to allow only double quotes, then it reads it with no errors. Another difference between read.table and read.csv is that read.table defaults to allowing # comments, which mangles one of the lines. This can be fixed with comment.char=”″, at which point read.csv and read.table produce the same result.