azergante answers How Much Are LLMs Actually Boosting Real-World Programmer Productivity?

azergante 5 Mar 2025 21:48 UTC
12 points
4
Many developers have been reporting that this is dramatically increasing their productivity, up to 5x’ing/10x’ing it

I challenge the data: none of my colleagues have been reporting this high a speed-up. I think your observation can just be explained by a high sampling bias.

People who do not use AI or got no improvement are unlikely to report. You also mention Twitter where users share “hot takes” etc to increase engagement.

It’s good to have actual numbers before we explain them, so I ran a quick search and found 3 articles that look promising (I only did a basic check on the methodology, don’t take these numbers at face value without analyzing the source in depth):
- An Axify analysis of the DORA metrics they collect: https://axify.io/blog/use-ai-for-developer-productivity
Documentation quality (+7.5%)

Code review speed (+3.1%)

Delivery throughput (-1.5%): AI adoption slightly decreases delivery throughput, usually due to over-reliance, learning curve, and increased complexity.

Delivery stability (-7.2%): It is significantly impacted because AI tools can generate incorrect or incomplete code, increasing the risk of production errors.

What are the DORA metrics?

Deployment frequency | How often a team puts an item into production.

Lead time for changes | Time required for a commit to go into production.

Change failure rate | Percentage of deployments resulting in production failure.

Failed deployment recovery time | Time required for a team to recover from a production failure.
- A McKinsey pilot study on 40 of their developers: https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/unleashing-developer-productivity-with-generative-ai
- An UpLevel analysis of the data of 800 developers: https://uplevelteam.com/blog/ai-for-developer-productivity
Analyzing actual engineering data from a sample of nearly 800 developers and objective metrics, such as cycle time, PR throughput, bug rate, and extended working hours (“Always On” time), we found that Copilot access provided no significant change in efficiency metrics.

The group using Copilot introduced 41% more bugs

Copilot access didn’t mitigate the risk of burnout

“The adoption rate is significantly below 100% in all three experiments,” the researchers wrote. “With around 30-40% of the engineers not even trying the product.

In my experience LLMs are a replacement for search engines (these days, search engines are only good to find info when you already know on which website to look …). They don’t do well in moderately sized code bases, nor in code bases with lots of esoteric business logic, which is to say they don’t do well in most enterprise software.

I mostly use them as:
- a documentation that talks: it reminds me of syntax, finds functions in APIs and generates code snippets I can play with to understand APIs
- a fuzzy version of an encyclopedia: it helps me get an overview of a subject and points me to resources where I can get crisper knowledge
- support/better search engine for weird build issues, or weird bugs etc
I think it’s also good at one shot scripts, such as data wrangling and data viz, but it does not come up often in my current position.

I would rate the productivity increase at about 10%. I think the use of modal editors (Vim or modern alternatives) improve coding speed more than inline AI completion, which is often distracting.

A lot of my time is spent understanding what I need to code in a back and forth with the product manager or clients. Then I can either code it myself (it’s not so hard once I understand the requirements) or spend time explaining the AI what it needs to do, watch it write sloppy code, and rewrite the thing.

Once in a while the coding part is actually the hard part, for example when I need to make the code fast or make a feature play well with the existing architecture, but then the AI can’t do logic well enough to optimize code, nor can it reason about the entire code base which it can’t see anyway.