It sounds like both the study authors themselves and many of the comments are trying to spin this study in the narrowest possible way for some reason, so I’m gonna go ahead make the obvious claim: this result in fact generalizes pretty well. Beyond the most incompetent programmers working on the most standard cookie-cutter tasks with the least necessary context, AI is more likely to slow developers down than speed them up. When this happens, the developers themselves typically think they’ve been sped up, and their brains are lying to them.
And the obvious action-relevant takeaway is: if you think AI is speeding up your development, you should take a very close and very skeptical look at why you believe that.
I agree. I’ve been saying for awhile that LLMs are highly optimized to seem useful, and people should be very cautious about assessing their usefulness for that reason. This seems like strong and unambiguous positive evidence for that claim. And a lot of the reaction does seem like borderline cope—this is NOT what you would expect to see in AI 2027 like scenarios. It is worth updating explicitly!
I think that people are trying to say “look AI is progressing really fast, we shouldn’t make the mistake of thinking this is a fundamental limitation.” That may be, but the minimal thing I’m asking here is to actually track the evidence in favor of at least one alternative hypothesis: LLMs are not as useful as they seem.
gemini seemed useful for research and pushed me in the other direction. But lately there have been some bearish signs for LLMs (bullish for survival). Claude opus 4 is not solving longer time horizon tasks than o3. Agency on things like Pokémon, the experiment on running a vending machine, and net hack is still not good. And grok 3 is so toxic that I think this is best viewed as a capabilities problem which I personally would expect to be solved if AGI were very near. Also reasoning models seem to see INCREASED hallucinations.
My P(doom) has dropped back from 45% to 40% on these events.
I’m mostly cautious about overupdating here, because it’s too pleasant (and personally vindicating) result to see. But yeah, I would bet on this generalizing pretty broadly.
I use language models to help me design systems, not by asking them to solve problems, but by discussing my ideas with them. I have an idea of how to do something, usually vague, half-formed. I use automatic speech recognition to just ramble about it, describing the idea in messy, imprecise language. The language model listens and replies with a clearer, more structured version. I read or listen to that and immediately see what’s missing, or what’s wrong, or what’s useful. Then I refine the idea further. This loop continues until the design feels solid.
The model doesn’t invent the solution. It refines and reflects what I’m already trying to express. That’s the key. It doesn’t act as an agent; it’s not writing the code or proposing speculative alternatives. It helps me pin down what I’m already trying to do, but better, faster, and with much less friction than if I were doing it alone.
I mostly don’t use autocomplete. I don’t ask for “write this function.” (Though I think there is a correct way to use these.) Instead, I might say something like: “Right now I have this global state that stores which frame to draw for an animation. But that feels hacky. What if I want to run multiple of these at the same time? Maybe I can just make it a function of time. Like, if I have a function that, given time, tells me what to draw, then I don’t need to store any state. That would probably work. Is there any reason this wouldn’t work?” And the LM will restate the idea precisely: “You’re proposing to push side effects to the boundary and define animation as a pure function of time, like in a React-style architecture.” That clarity helps me immediately refine or correct the idea.
This changes the kind of work I can do. Without the model, I default to braindead hacking: solve local problems quickly, but end up with brittle, tangled code. Thinking structurally takes effort, and I often don’t do it. But in a conversational loop with the model, it’s fun. And because the feedback is immediate, it keeps momentum going.
This does offload cognition, but not by replacing my thinking. It’s integrated into it. The model isn’t doing the task. It’s helping me think more effectively about how to do the task. It names patterns I gestured at. It rephrases vague concepts sharply enough that I can critique them. It lets me externalize a confused internal state and get back something slightly clearer that I can then respond to. This creates an iterative improvement loop.
Maybe this works very well for me because I have ADHD. Maybe most people can just sit down and reflect in silence. For me, talking to the model lowers the activation energy and turns reflection into dialogue, which makes it very easy to do.
People say LMs slow you down. That’s true if you’re using them to write broken code from vague prompts and then patch the errors. But that’s not what I’m doing. I use them to think better, not to think less.
Similar here. For me, the greatest benefit is to have someone I can discuss the problem with. A rubber duck, Stack Exchange, peer programming—all in one. As a consequence, not only I implement something, but I also understand what I did and why. (Yeah, in theory, as a senior developer, I should always understand what I do and why… but there is a tradeoff between deep understanding and time spent.)
So, from my perspective, this is similar to saying that writing automated tests only slows you down.
More precisely, I do find it surprising that developers were slowed down by using AI. I just think that in longer term it is worth using it anyway.
It sounds like both the study authors themselves and many of the comments are trying to spin this study in the narrowest possible way for some reason, so I’m gonna go ahead make the obvious claim: this result in fact generalizes pretty well. Beyond the most incompetent programmers working on the most standard cookie-cutter tasks with the least necessary context, AI is more likely to slow developers down than speed them up. When this happens, the developers themselves typically think they’ve been sped up, and their brains are lying to them.
And the obvious action-relevant takeaway is: if you think AI is speeding up your development, you should take a very close and very skeptical look at why you believe that.
I agree. I’ve been saying for awhile that LLMs are highly optimized to seem useful, and people should be very cautious about assessing their usefulness for that reason. This seems like strong and unambiguous positive evidence for that claim. And a lot of the reaction does seem like borderline cope—this is NOT what you would expect to see in AI 2027 like scenarios. It is worth updating explicitly!
I think that people are trying to say “look AI is progressing really fast, we shouldn’t make the mistake of thinking this is a fundamental limitation.” That may be, but the minimal thing I’m asking here is to actually track the evidence in favor of at least one alternative hypothesis: LLMs are not as useful as they seem.
gemini seemed useful for research and pushed me in the other direction. But lately there have been some bearish signs for LLMs (bullish for survival). Claude opus 4 is not solving longer time horizon tasks than o3. Agency on things like Pokémon, the experiment on running a vending machine, and net hack is still not good. And grok 3 is so toxic that I think this is best viewed as a capabilities problem which I personally would expect to be solved if AGI were very near. Also reasoning models seem to see INCREASED hallucinations.
My P(doom) has dropped back from 45% to 40% on these events.
I’m mostly cautious about overupdating here, because it’s too pleasant (and personally vindicating) result to see. But yeah, I would bet on this generalizing pretty broadly.
I use language models to help me design systems, not by asking them to solve problems, but by discussing my ideas with them. I have an idea of how to do something, usually vague, half-formed. I use automatic speech recognition to just ramble about it, describing the idea in messy, imprecise language. The language model listens and replies with a clearer, more structured version. I read or listen to that and immediately see what’s missing, or what’s wrong, or what’s useful. Then I refine the idea further. This loop continues until the design feels solid.
The model doesn’t invent the solution. It refines and reflects what I’m already trying to express. That’s the key. It doesn’t act as an agent; it’s not writing the code or proposing speculative alternatives. It helps me pin down what I’m already trying to do, but better, faster, and with much less friction than if I were doing it alone.
I mostly don’t use autocomplete. I don’t ask for “write this function.” (Though I think there is a correct way to use these.) Instead, I might say something like: “Right now I have this global state that stores which frame to draw for an animation. But that feels hacky. What if I want to run multiple of these at the same time? Maybe I can just make it a function of time. Like, if I have a function that, given time, tells me what to draw, then I don’t need to store any state. That would probably work. Is there any reason this wouldn’t work?” And the LM will restate the idea precisely: “You’re proposing to push side effects to the boundary and define animation as a pure function of time, like in a React-style architecture.” That clarity helps me immediately refine or correct the idea.
This changes the kind of work I can do. Without the model, I default to braindead hacking: solve local problems quickly, but end up with brittle, tangled code. Thinking structurally takes effort, and I often don’t do it. But in a conversational loop with the model, it’s fun. And because the feedback is immediate, it keeps momentum going.
This does offload cognition, but not by replacing my thinking. It’s integrated into it. The model isn’t doing the task. It’s helping me think more effectively about how to do the task. It names patterns I gestured at. It rephrases vague concepts sharply enough that I can critique them. It lets me externalize a confused internal state and get back something slightly clearer that I can then respond to. This creates an iterative improvement loop.
Maybe this works very well for me because I have ADHD. Maybe most people can just sit down and reflect in silence. For me, talking to the model lowers the activation energy and turns reflection into dialogue, which makes it very easy to do.
People say LMs slow you down. That’s true if you’re using them to write broken code from vague prompts and then patch the errors. But that’s not what I’m doing. I use them to think better, not to think less.
Similar here. For me, the greatest benefit is to have someone I can discuss the problem with. A rubber duck, Stack Exchange, peer programming—all in one. As a consequence, not only I implement something, but I also understand what I did and why. (Yeah, in theory, as a senior developer, I should always understand what I do and why… but there is a tradeoff between deep understanding and time spent.)
So, from my perspective, this is similar to saying that writing automated tests only slows you down.
More precisely, I do find it surprising that developers were slowed down by using AI. I just think that in longer term it is worth using it anyway.