Right. In the shower, I also realized that this is just comparing averages- if, say, 10% of the population really hates spoilers, but the other 90% enjoys them enough to make the average for spoiling higher, it’s still sensible to put spoiler warnings as a courtesy to the 10%, because the comparison is “cost to warn vs. benefit of warning” not “spoil for everyone or spoil for no one.”
It’s worth noting that the experimental setup involved putting spoilers into the opening of an unfamiliar story—that is, the subjects didn’t know they were being spoiled. Suggests that if people enjoy stories less after learning spoilers, it may be their own fault (akin to all the studies on wine and perception).
I don’t think that’s the issue; if you look at the graphs, the standard deviation is tiny compared to the variability between stories, and in some the spoilers/no-spoilers don’t even overlap. The stats:
For all three experiments, analyses of variance revealed a significant effect of condition. (In order to control for variability between stories, we analyzed the data by comparing different versions of the same story.) Subjects significantly preferred spoiled over unspoiled stories in the case of both the ironic- twist stories (6.20 vs. 5.79), p = .013, Cohen’s d = 0.18, and the mysteries (7.29 vs. 6.60), p = .001, d = 0.34. The evocative stories were appreciated less overall, likely because of their more expressly literary aims, but subjects again significantly preferred spoiled over unspoiled versions (5.50 vs. 5.03), p = .019, d = 0.22. In all three story types, incorporating spoiler texts into stories had no effect on how much they were liked, ps > .4. Subjects also did not indicate in their free responses that they found these altered beginnings out of place or jarring.
The graphs show standard error, not standard deviation. Standard error is standard deviation divided by the square root of the sample size. It’s included on graphs to show which differences are statistically significant—it does not give a sense of the variability within a group.
Cohen’s d counts standard deviations (d=.18 means that the two means are .18 standard deviations apart), so there is actually a lot of overlap between the groups.
I agree that the small standard deviation suggests that either that doesn’t happen or the people in question are much less prevalent than 10% of the population (a number I picked because I have ten fingers). I also suspect that the mechanism roystgnr identified is stronger than the mechanism I identified.
This study isn’t set up to differentiate between people, which is what we would need to make a warning policy.
(I had an erroneous statement about the sample size here, which I’ve deleted.)
Hmm. That looks like a memory error on my part, as rereading it I don’t see what I thought the n was (I remembered ~40). I think I saw 30 subjects, failed to multiply by 24, and it got fuzzed with the passing of time.
This is, I think, a general problem with many of the studies we rely on. We learn a lot about the averages, but often I find that useless—I want to know more in-depth information about the subgroups, outliers, etc. As seen here and in the linked article, these same studies are often misinterpreted for this reason. Perhaps it’s worth a post? Especially if someone is more familiar with the pertinent methodology than I.
Right. In the shower, I also realized that this is just comparing averages- if, say, 10% of the population really hates spoilers, but the other 90% enjoys them enough to make the average for spoiling higher, it’s still sensible to put spoiler warnings as a courtesy to the 10%, because the comparison is “cost to warn vs. benefit of warning” not “spoil for everyone or spoil for no one.”
It’s worth noting that the experimental setup involved putting spoilers into the opening of an unfamiliar story—that is, the subjects didn’t know they were being spoiled. Suggests that if people enjoy stories less after learning spoilers, it may be their own fault (akin to all the studies on wine and perception).
I don’t think that’s the issue; if you look at the graphs, the standard deviation is tiny compared to the variability between stories, and in some the spoilers/no-spoilers don’t even overlap. The stats:
The graphs show standard error, not standard deviation. Standard error is standard deviation divided by the square root of the sample size. It’s included on graphs to show which differences are statistically significant—it does not give a sense of the variability within a group.
Cohen’s d counts standard deviations (d=.18 means that the two means are .18 standard deviations apart), so there is actually a lot of overlap between the groups.
I agree that the small standard deviation suggests that either that doesn’t happen or the people in question are much less prevalent than 10% of the population (a number I picked because I have ten fingers). I also suspect that the mechanism roystgnr identified is stronger than the mechanism I identified.
This study isn’t set up to differentiate between people, which is what we would need to make a warning policy.
(I had an erroneous statement about the sample size here, which I’ve deleted.)
Small n? They used 819 subjects—that’s bigger than pretty much any psychology cited on LW!
Hmm. That looks like a memory error on my part, as rereading it I don’t see what I thought the n was (I remembered ~40). I think I saw 30 subjects, failed to multiply by 24, and it got fuzzed with the passing of time.
Thanks for the correction!
This is, I think, a general problem with many of the studies we rely on. We learn a lot about the averages, but often I find that useless—I want to know more in-depth information about the subgroups, outliers, etc. As seen here and in the linked article, these same studies are often misinterpreted for this reason. Perhaps it’s worth a post? Especially if someone is more familiar with the pertinent methodology than I.