johnswentworth comments on Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

johnswentworth 11 Jul 2025 20:38 UTC
37 points
9
Apologies for the impoliteness, but… man, it sure sounds like you’re searching for reasons to dismiss the study results. Which sure is a red flag when the study results basically say “your remembered experience is that AI sped you up, and your remembered experience is unambiguously wrong about that”.
Like, look, when someone comes along with a nice clean study showing that your own brain is lying to you, that has got to be one of the worst possible times to go looking for reasons to dismiss the study.
- Ruby 11 Jul 2025 22:02 UTC
  33 points
  11
  Parent
  I’m not pushing against the study results so much as what I think are misinterpretations people are going to make off of this study.
  
  If the claim is “on a selected kind of tasks, developers in early 2025 predominantly using models like Claude 3.5 and 3.7 were slowed down when they though they were sped up”, then I’m not dismissing that. I don’t think the study is that clean or unambiguous giving methodological challenges, but I find it quite plausible.
  
  In the above, I do only the following: (1) offer explanation for the result, (2) point out that I individually feel misrepresented by a particular descriptor use, (3) point out and affirm points the authors also make (a) about this being point-in-time, (b) there being selection effects at play.
  
  You can say that if I feel now that I’m being speed up, I should be less confident given the results, and to that I say yeah, I am. And I’m surprised by the result too.
  
  There’s a claim you’re making here that “I went looking for reasons” that feels weird. I don’t take it that whenever a result is “your remembered experience is wrong”, I’m being epistemically unvirtuous if I question it or discuss details. To repeat, I question the interpretation/generalization some might make rather than the raw result or even what the authors interpret it as, and I think as a participant I’m better positioned to notice the misgeneralization than just hearing the headline result (people reading the actual paper probably end up in the right place).
- Ruby 11 Jul 2025 22:06 UTC
  4 points
  0
  Parent
  Fwiw, for me the calculated individual speed up was [-200%, 40%], which while it does weight predominantly in the negative (I got these numbers from the authors after writing my above comments). I’m not sure if that counts as unambiguously wrong about my remembered experience.
  - Joel Becker 14 Jul 2025 17:12 UTC
    1 point
    0
    Parent
    I think it doesn’t—our dev effects are so, so noisy!