harsimony comments on On METR’s AI Coding RCT

harsimony 19 Jul 2025 14:53 UTC
8 points
0
I feel like people are dismissing this study out of hand without updating appropriately. If there’s at least a chance that this result replicates, that should shift our opinions somewhat.
First, a few reasons why the common counterarguments aren’t strong enough to dismiss the study:
1. I’ve been seeing arguments against this result based on vibes or claims that the next generation of LLM’s will overturn this result. But that is directly contradicted by the results of this study, people’s feelings are poor indicators of actual productivity.
2. On Cursor experience, I think Joel Becker had a reasonable response here. Essentially, many of the coders had tried cursor, had some experience with it, and had a lot of experience using LLM’s for programming. Is the learning curve really so steep that we shouldn’t see them improve over the many tasks? See image below. Perhaps the fact that these programmers don’t use it and saw little improvement is a sign that Cursor isn’t very helpful.
3. While this is a challenging environment for LLM coding tools, this is the sort of environment I want to see improvement in for AI to have a transformative impact on coding. Accelerating experienced devs is where a lot of the value of automating coding will come from.
That aside, how should we change our opinions with regard to the study?
1. Getting AI to be useful in a particular domain is tricky, you have to actually run tests and establish good practices.
2. Anecdotes about needing discipline to stay on task with coding tools and the cursor learning curve suggest that AI adoption has frictions and requires tacit knowledge to use.
3. Coding is one of the cleanest, most data-rich, most LLM-developer-supported domains. As of yet, AI automation is not a slam dunk, even here. Every other domain will require its own iteration, testing, and practice to see a benefit.
4. If this holds, the points above slow AI diffusion, particularly when used as a tool for humans. Modelling the impact of current and near-future AI’s should take this into account.