Here’s a step-by-step process of how I did my analysis. This should include enough detail to get feedback or for others to do replications.
Step 1: I picked out the 17 relevant variables: 5 regarding LW exposure (LessWrongUse, KarmaScore, Sequences, TimeinCommunity, Meetups), 5 regarding intelligence (IQ, SATscoresoutof1600, SATscoresoutof2400, ACTscoreoutof36, IQTest), 6 from the CFAR questions (CFARQuestion1, CFARQuestion2, CFARQuestion3, CFARQuestion4, CFARQuestion5, CFARQuestion7), and Country.
Step 2: I cleaned up the data
removing or fixing non-numerical entries (e.g., on the anchoring question “~500” became 500, “100 Ft” became 100, “100m” became 328, and “we use metric in canada :)” became blank.
removing impossible responses (e.g., 1780 on SATscoresoutof1600)
Step 3: I coded categorical variables to make them numerical and transformed skewed continuous variables to make them close to normally distributed:
LessWrongUse: coded 1,2,3,4,5, where 1=”I lurk, but never registered an account” and 5=”I’ve posted in Main”, and treated as a continuous variable
KarmaScore: took ln(karma + 1)
Sequences: coded 1,2,3,4,5,6,7, where 1=”Never even knew they existed until this moment” and 5=”Nearly all of the Sequences”, and treated as a continuous variable
TimeinCommunity: took sqrt
Meetups: coded 0,1, and treated as a continuous variable
CFARQuestion1: coded “Yes” as 1, other options as 0
CFARQuestion2: coded “$75 in 60 days” as 1, other option as 0
CFARQuestion3: coded “The smaller hospital” as 1, other options as 0
CFARQuestion4: coded “Drug A” as 1, other options as 0
CFARQuestion5: I created 2 new versions, ln(anchor+1) and sqrt
CFARQuestion7: I created a new version, taking the sqrt
Step 4: I removed a few outliers and wildly implausible responses on the continuous variables:
3 on IQTest: 3, 18, and 66
2 on height of tallest redwood in ft: 1 and 10,000
Step 5: I created composite scales of LW exposure, Intelligence, and Accuracy Rate.
For LW exposure, I verified that the 5 variables (after transformation) were positively correlated with each other (they were), standardized each variable (giving it mean=0 and stdev=1), and then averaged them together. People with missing data on some of the questions get the average of the questions that they did answer.
For Intelligence, I verified that the 5 variables were positively correlated with each other (they were) and standardized each variable (giving it mean=0 and stdev=1). I took advantage of the fact that 432 people responded to 2 or more of the questions to put them on the same scale. This probably isn’t the best way to do it, but what I did was to run a principal components analysis. All 5 variables (unsurprisingly) loaded onto the same factor, so I saved that factor with imputation and called it “Intelligence”. This gave to a score to everyone who answered at least one of the 5 questions, and was blank for those who answered none of them.
For Accuracy Rate, I averaged together the scores on CFAR questions 1-4. For each person who answered at least 1 of the questions, this gave the percent of the questions that they got “correct” (out of the ones that they answered).
Step 6: I analyzed the first 4 CFAR questions (and the overall Accuracy Rate). To find the percent of LWers who got the “correct” answer to each, I just looked at the mean. To test if LW exposure predicted “correct” responses, I ran logistic regressions with LW exposure as the independent variable and each question as the dependent variable (for Accuracy Rate, I did a linear regression). To test if Intelligence predicted “correct” responses, I repeated these analyses with Intelligence as the independent variable. To test if these effects were independent, I repeated the analyses as multiple regressions with both LW exposure and Intelligence as IVs.
Every analysis gave unambiguous results (effects that were highly significant or not close), and to make them more presentable I turned LW exposure into a categorical variable by splitting it into thirds, setting the cutoffs manually (n’s of 389-390-389). I calculated the mean on each CFAR question for each group, and reported that (skipping the middle group), verifying that it matched the pattern of the analysis with the continuous variable. I did the same with Intelligence (n’s 303-285-289). To calculate the reported “gaps while controlling for Intelligence”, I ran multiple regressions with Intelligence as a continuous predictor variable and LW exposure as a categorical variable, and looked at the least squared means for the 3 levels of LW exposure.
Step 7: For the anchoring question, I looked only at participants who selected “United States” as their Country (due to issues with units). To test if there is an anchoring effect I ran a linear regression predicting the answer (question 7) based on the anchor value (question 5). First I looked at which transformations of the variables were closest to normally distributed; for question 5 that fell somewhere in between the original anchor value and sqrt(anchor) and for question 7 it fell in between sqrt(estimate) or ln(estimate+1). So I ran 4 linear regressions using each combination of those variables; they gave very similar results (statistically significant with similar R^2). Then I repeated the analysis with the most easily interpretable version fo the variables, the un-transformed ones, and it gave a similar result (with very slightly lower R^2) so I reported that one (since it had the most natural interpretation, a slope which can be compared to previous research).
To test if LW exposure moderated the strength of the anchoring effect, I ran a multiple regression predicting the estimate based on three continuous variables: anchor, LW exposure, and their interaction (anchor x LW exposure). The interaction effect (which would indicate whether strongly-tied LWers showed a weaker anchoring effect) was nonsignificant. I repeated this analysis with the more-normally-distributed transformed variables to be sure, and confirmed the result. I also repeated it with Intelligence in place of LW exposure, and found the same result. For reporting the results, I just ran a simple linear regression predicting estimates from anchor values, separately for the high LW exposure third and the low LW exposure third (and the same with Intelligence).
Step 8: To test if the difference between LWers and the general public is due to intelligence differences, I looked at the relationship between Intelligence and accuracy on each of the first 4 CFAR questions and extrapolated to estimate the accuracy rate of a LWer with an IQ of 100. First, I created a fake participant with a reported IQ of 100 and an IQTest of 100, and made sure they had an imputed score on the “Intelligence” composite. Then I re-ran the regression analyses (from Step 6) predicting “correct” answers for the first 4 CFAR questions based on the continuous Intelligence variable, and saved the predicted values. These are the estimates of the probability that a LWer would get that question “correct”, given their Intelligence score. I looked at these values for the fake participant; they were still noticeably larger than the accuracy rates in the original study. I repeated this analysis two more times, using reported IQ and IQTest as the predictor variable instead of using the composite Intelligence score, and got similar results.
Step 9: I ran a few other analyses which are not part of the main storyline. I looked separately at the components of the composite predictor variables (LW Exposure & Intelligence) to see if they similarly predicted accuracy rates (they generally did). I looked for interaction effects between LW Exposure & Intelligence in predicting accuracy rates (including LW Exposure, Intelligence, and their interaction LW Exposure x Intelligence as predictor variables) - there were no significant effects. I looked at the anchoring question for the non-US LWers; first running an analysis only on the non-US group which paralleled my analysis on the US group, and then running an analysis with a US vs. non-US variable, anchor value, and their interaction as predictors of the estimate. I also ran a few other analyses in response to comments.
Here’s a step-by-step process of how I did my analysis. This should include enough detail to get feedback or for others to do replications.
Step 1: I picked out the 17 relevant variables: 5 regarding LW exposure (LessWrongUse, KarmaScore, Sequences, TimeinCommunity, Meetups), 5 regarding intelligence (IQ, SATscoresoutof1600, SATscoresoutof2400, ACTscoreoutof36, IQTest), 6 from the CFAR questions (CFARQuestion1, CFARQuestion2, CFARQuestion3, CFARQuestion4, CFARQuestion5, CFARQuestion7), and Country.
Step 2: I cleaned up the data
removing or fixing non-numerical entries (e.g., on the anchoring question “~500” became 500, “100 Ft” became 100, “100m” became 328, and “we use metric in canada :)” became blank.
removing impossible responses (e.g., 1780 on SATscoresoutof1600)
Step 3: I coded categorical variables to make them numerical and transformed skewed continuous variables to make them close to normally distributed:
LessWrongUse: coded 1,2,3,4,5, where 1=”I lurk, but never registered an account” and 5=”I’ve posted in Main”, and treated as a continuous variable KarmaScore: took ln(karma + 1) Sequences: coded 1,2,3,4,5,6,7, where 1=”Never even knew they existed until this moment” and 5=”Nearly all of the Sequences”, and treated as a continuous variable TimeinCommunity: took sqrt Meetups: coded 0,1, and treated as a continuous variable
CFARQuestion1: coded “Yes” as 1, other options as 0 CFARQuestion2: coded “$75 in 60 days” as 1, other option as 0 CFARQuestion3: coded “The smaller hospital” as 1, other options as 0 CFARQuestion4: coded “Drug A” as 1, other options as 0 CFARQuestion5: I created 2 new versions, ln(anchor+1) and sqrt CFARQuestion7: I created a new version, taking the sqrt
Step 4: I removed a few outliers and wildly implausible responses on the continuous variables:
3 on IQTest: 3, 18, and 66
2 on height of tallest redwood in ft: 1 and 10,000
Step 5: I created composite scales of LW exposure, Intelligence, and Accuracy Rate.
For LW exposure, I verified that the 5 variables (after transformation) were positively correlated with each other (they were), standardized each variable (giving it mean=0 and stdev=1), and then averaged them together. People with missing data on some of the questions get the average of the questions that they did answer.
For Intelligence, I verified that the 5 variables were positively correlated with each other (they were) and standardized each variable (giving it mean=0 and stdev=1). I took advantage of the fact that 432 people responded to 2 or more of the questions to put them on the same scale. This probably isn’t the best way to do it, but what I did was to run a principal components analysis. All 5 variables (unsurprisingly) loaded onto the same factor, so I saved that factor with imputation and called it “Intelligence”. This gave to a score to everyone who answered at least one of the 5 questions, and was blank for those who answered none of them.
For Accuracy Rate, I averaged together the scores on CFAR questions 1-4. For each person who answered at least 1 of the questions, this gave the percent of the questions that they got “correct” (out of the ones that they answered).
Step 6: I analyzed the first 4 CFAR questions (and the overall Accuracy Rate). To find the percent of LWers who got the “correct” answer to each, I just looked at the mean. To test if LW exposure predicted “correct” responses, I ran logistic regressions with LW exposure as the independent variable and each question as the dependent variable (for Accuracy Rate, I did a linear regression). To test if Intelligence predicted “correct” responses, I repeated these analyses with Intelligence as the independent variable. To test if these effects were independent, I repeated the analyses as multiple regressions with both LW exposure and Intelligence as IVs.
Every analysis gave unambiguous results (effects that were highly significant or not close), and to make them more presentable I turned LW exposure into a categorical variable by splitting it into thirds, setting the cutoffs manually (n’s of 389-390-389). I calculated the mean on each CFAR question for each group, and reported that (skipping the middle group), verifying that it matched the pattern of the analysis with the continuous variable. I did the same with Intelligence (n’s 303-285-289). To calculate the reported “gaps while controlling for Intelligence”, I ran multiple regressions with Intelligence as a continuous predictor variable and LW exposure as a categorical variable, and looked at the least squared means for the 3 levels of LW exposure.
Step 7: For the anchoring question, I looked only at participants who selected “United States” as their Country (due to issues with units). To test if there is an anchoring effect I ran a linear regression predicting the answer (question 7) based on the anchor value (question 5). First I looked at which transformations of the variables were closest to normally distributed; for question 5 that fell somewhere in between the original anchor value and sqrt(anchor) and for question 7 it fell in between sqrt(estimate) or ln(estimate+1). So I ran 4 linear regressions using each combination of those variables; they gave very similar results (statistically significant with similar R^2). Then I repeated the analysis with the most easily interpretable version fo the variables, the un-transformed ones, and it gave a similar result (with very slightly lower R^2) so I reported that one (since it had the most natural interpretation, a slope which can be compared to previous research).
To test if LW exposure moderated the strength of the anchoring effect, I ran a multiple regression predicting the estimate based on three continuous variables: anchor, LW exposure, and their interaction (anchor x LW exposure). The interaction effect (which would indicate whether strongly-tied LWers showed a weaker anchoring effect) was nonsignificant. I repeated this analysis with the more-normally-distributed transformed variables to be sure, and confirmed the result. I also repeated it with Intelligence in place of LW exposure, and found the same result. For reporting the results, I just ran a simple linear regression predicting estimates from anchor values, separately for the high LW exposure third and the low LW exposure third (and the same with Intelligence).
Step 8: To test if the difference between LWers and the general public is due to intelligence differences, I looked at the relationship between Intelligence and accuracy on each of the first 4 CFAR questions and extrapolated to estimate the accuracy rate of a LWer with an IQ of 100. First, I created a fake participant with a reported IQ of 100 and an IQTest of 100, and made sure they had an imputed score on the “Intelligence” composite. Then I re-ran the regression analyses (from Step 6) predicting “correct” answers for the first 4 CFAR questions based on the continuous Intelligence variable, and saved the predicted values. These are the estimates of the probability that a LWer would get that question “correct”, given their Intelligence score. I looked at these values for the fake participant; they were still noticeably larger than the accuracy rates in the original study. I repeated this analysis two more times, using reported IQ and IQTest as the predictor variable instead of using the composite Intelligence score, and got similar results.
Step 9: I ran a few other analyses which are not part of the main storyline. I looked separately at the components of the composite predictor variables (LW Exposure & Intelligence) to see if they similarly predicted accuracy rates (they generally did). I looked for interaction effects between LW Exposure & Intelligence in predicting accuracy rates (including LW Exposure, Intelligence, and their interaction LW Exposure x Intelligence as predictor variables) - there were no significant effects. I looked at the anchoring question for the non-US LWers; first running an analysis only on the non-US group which paralleled my analysis on the US group, and then running an analysis with a US vs. non-US variable, anchor value, and their interaction as predictors of the estimate. I also ran a few other analyses in response to comments.