Apologies for the impoliteness, but… man, it sure sounds like you’re searching for reasons to dismiss the study results. Which sure is a red flag when the study results basically say “your remembered experience is that AI sped you up, and your remembered experience is unambiguously wrong about that”.
Like, look, when someone comes along with a nice clean study showing that your own brain is lying to you, that has got to be one of the worst possible times to go looking for reasons to dismiss the study.
I’m not pushing against the study results so much as what I think are misinterpretations people are going to make off of this study.
If the claim is “on a selected kind of tasks, developers in early 2025 predominantly using models like Claude 3.5 and 3.7 were slowed down when they though they were sped up”, then I’m not dismissing that. I don’t think the study is that clean or unambiguous giving methodological challenges, but I find it quite plausible.
In the above, I do only the following: (1) offer explanation for the result, (2) point out that I individually feel misrepresented by a particular descriptor use, (3) point out and affirm points the authors also make (a) about this being point-in-time, (b) there being selection effects at play.
You can say that if I feel now that I’m being speed up, I should be less confident given the results, and to that I say yeah, I am. And I’m surprised by the result too.
There’s a claim you’re making here that “I went looking for reasons” that feels weird. I don’t take it that whenever a result is “your remembered experience is wrong”, I’m being epistemically unvirtuous if I question it or discuss details. To repeat, I question the interpretation/generalization some might make rather than the raw result or even what the authors interpret it as, and I think as a participant I’m better positioned to notice the misgeneralization than just hearing the headline result (people reading the actual paper probably end up in the right place).
Fwiw, for me the calculated individual speed up was [-200%, 40%], which while it does weight predominantly in the negative (I got these numbers from the authors after writing my above comments). I’m not sure if that counts as unambiguously wrong about my remembered experience.
Apologies for the impoliteness, but… man, it sure sounds like you’re searching for reasons to dismiss the study results. Which sure is a red flag when the study results basically say “your remembered experience is that AI sped you up, and your remembered experience is unambiguously wrong about that”.
Like, look, when someone comes along with a nice clean study showing that your own brain is lying to you, that has got to be one of the worst possible times to go looking for reasons to dismiss the study.
I’m not pushing against the study results so much as what I think are misinterpretations people are going to make off of this study.
If the claim is “on a selected kind of tasks, developers in early 2025 predominantly using models like Claude 3.5 and 3.7 were slowed down when they though they were sped up”, then I’m not dismissing that. I don’t think the study is that clean or unambiguous giving methodological challenges, but I find it quite plausible.
In the above, I do only the following: (1) offer explanation for the result, (2) point out that I individually feel misrepresented by a particular descriptor use, (3) point out and affirm points the authors also make (a) about this being point-in-time, (b) there being selection effects at play.
You can say that if I feel now that I’m being speed up, I should be less confident given the results, and to that I say yeah, I am. And I’m surprised by the result too.
There’s a claim you’re making here that “I went looking for reasons” that feels weird. I don’t take it that whenever a result is “your remembered experience is wrong”, I’m being epistemically unvirtuous if I question it or discuss details. To repeat, I question the interpretation/generalization some might make rather than the raw result or even what the authors interpret it as, and I think as a participant I’m better positioned to notice the misgeneralization than just hearing the headline result (people reading the actual paper probably end up in the right place).
Fwiw, for me the calculated individual speed up was [-200%, 40%], which while it does weight predominantly in the negative (I got these numbers from the authors after writing my above comments). I’m not sure if that counts as unambiguously wrong about my remembered experience.
I think it doesn’t—our dev effects are so, so noisy!