I don’t think that’s the issue; if you look at the graphs, the standard deviation is tiny compared to the variability between stories, and in some the spoilers/no-spoilers don’t even overlap. The stats:
For all three experiments, analyses of variance revealed a significant effect of condition. (In order to control for variability between stories, we analyzed the data by comparing different versions of the same story.) Subjects significantly preferred spoiled over unspoiled stories in the case of both the ironic- twist stories (6.20 vs. 5.79), p = .013, Cohen’s d = 0.18, and the mysteries (7.29 vs. 6.60), p = .001, d = 0.34. The evocative stories were appreciated less overall, likely because of their more expressly literary aims, but subjects again significantly preferred spoiled over unspoiled versions (5.50 vs. 5.03), p = .019, d = 0.22. In all three story types, incorporating spoiler texts into stories had no effect on how much they were liked, ps > .4. Subjects also did not indicate in their free responses that they found these altered beginnings out of place or jarring.
The graphs show standard error, not standard deviation. Standard error is standard deviation divided by the square root of the sample size. It’s included on graphs to show which differences are statistically significant—it does not give a sense of the variability within a group.
Cohen’s d counts standard deviations (d=.18 means that the two means are .18 standard deviations apart), so there is actually a lot of overlap between the groups.
I agree that the small standard deviation suggests that either that doesn’t happen or the people in question are much less prevalent than 10% of the population (a number I picked because I have ten fingers). I also suspect that the mechanism roystgnr identified is stronger than the mechanism I identified.
This study isn’t set up to differentiate between people, which is what we would need to make a warning policy.
(I had an erroneous statement about the sample size here, which I’ve deleted.)
Hmm. That looks like a memory error on my part, as rereading it I don’t see what I thought the n was (I remembered ~40). I think I saw 30 subjects, failed to multiply by 24, and it got fuzzed with the passing of time.
I don’t think that’s the issue; if you look at the graphs, the standard deviation is tiny compared to the variability between stories, and in some the spoilers/no-spoilers don’t even overlap. The stats:
The graphs show standard error, not standard deviation. Standard error is standard deviation divided by the square root of the sample size. It’s included on graphs to show which differences are statistically significant—it does not give a sense of the variability within a group.
Cohen’s d counts standard deviations (d=.18 means that the two means are .18 standard deviations apart), so there is actually a lot of overlap between the groups.
I agree that the small standard deviation suggests that either that doesn’t happen or the people in question are much less prevalent than 10% of the population (a number I picked because I have ten fingers). I also suspect that the mechanism roystgnr identified is stronger than the mechanism I identified.
This study isn’t set up to differentiate between people, which is what we would need to make a warning policy.
(I had an erroneous statement about the sample size here, which I’ve deleted.)
Small n? They used 819 subjects—that’s bigger than pretty much any psychology cited on LW!
Hmm. That looks like a memory error on my part, as rereading it I don’t see what I thought the n was (I remembered ~40). I think I saw 30 subjects, failed to multiply by 24, and it got fuzzed with the passing of time.
Thanks for the correction!