In the spirit of posting more on-the-ground impressions of capability: in my fairly simple front-end coding job, I’ve gone in the past year from writing maybe 50% of my code with AI to maybe 90%.
My job the past couple of months has been this: attending meetings to work out project requirements, breaking those requirements into a more specific sequence of tasks for the AI- often just three or four prompts with a couple of paragraphs of explanation each- then running through those in Cursor, reviewing the changes and making usually pretty minor edits, then testing- which almost never reveals errors introduced by the AI itself in recent weeks- and finally pushing out the code to repos.
Most of the edits I make have to do with the models’ reluctance to delete code- so, for example, if a block of code in function A needs to be moved into its own function so that functions B and C can call it, the AI will often just repeat the code block in B and C so that it doesn’t have to delete anything in A. It also sometimes comes up with strange excuses to avoid deleting code that’s become superfluous.
The models also occasionally have an issue where they’ll add fallbacks to prevent functions from returning an error even when they really should return an error, such as when a critical API call returns bad data.
So, in a way, the main bottleneck to the AI doing everything one-shot at this point seems to be alignment rather than capability- the models were trained to avoid errors and avoid deleting code, and they care more about those than producing good codebases. Though, that said, these issues almost never actually produce bugs, and dealing with them is arguably more stylistic than functional.
In my department, I think all of the other developers are using AI in the same way- judging by how the style of the code they’ve been deploying has changed recently- but nobody talks about it. It’s treated almost like an embarrassing open secret, like people watching YouTube videos while on the clock, and I think everyone’s afraid that if the project managers ever get a clear picture of how much the developers are acting like PMs for AI, the business will start cutting jobs.
the AI will often just repeat the code block in B and C so that it doesn’t have to delete anything in A
Some human devs do this too. In the short term it reduces the likelihood of breaking things because something you weren’t aware of relied on the old version. In the long term it makes changes harder, because now if you want to change the logic instead of changing it in one place you have to change it in n similar but usually not identical places, and if those places are different in ways that affect the implementation, now you have to try to make an informed guess about whether they’re different on purpose and if so why. Down that path lies madness.
I’ve gone in the past year from writing maybe 50% of my code with AI to maybe 90%.
I’m curious what fraction of your non-boilerplate, non-test code that ends up in production is AI-generated. Do you review it manually?
At this point probably >95% of the code I cause to be written is AI-generated. Most of the AI-generated code is exploratory[1] or rote[2], though. About 75% of the code I merge is AI-generated, but most of that is either boilerplate or tests[3]. and only 20% or so of the non-boilerplate, non-test code that makes it onto prod.
In any case, I should probably make a top level shortform to this effect, since this one got a lot more engagement than I was expecting—it was intended to be “I tried to get a measurement of how much AI can help me with maintenance work, and the attempt failed in an entertaining way” with a side of “I don’t think I’m going to be substantially replaced by clopus just yet”, but I have a bad feeling people are over-updating to “LLMs don’t help with programming”, which is not my experience at all.
e.g. mocks of what a flow could look like, comparing lots of different alerting thresholds against historical data to see how I want to configure alarms
DI wiring, includes, docblocks, that sort of thing. Basically the stuff I had keyboard shortcuts to fill in for me in 5 keystrokes in the days before AI.
“write tests” is, by far, the area I get the most value out of current LLM coding agents. I rarely have the time or energy to write the test suite I really want by hand, but I do have enough time to rattle off a few dozen “when x happens then y happens z should be true” style things. LLM coding agents can usually come up with more. Test code is also very tolerant of copy/paste/modify (so I’m willing to say “look at this example, and copy shamelessly from it to fit your needs), and is also much more tolerant of bad code than user-facing code is (since rewrites are low-risk and will generally break in obvious ways if they break. Between these factors, I am usually quite happy to ship LLM-written tests)
In the spirit of posting more on-the-ground impressions of capability: in my fairly simple front-end coding job, I’ve gone in the past year from writing maybe 50% of my code with AI to maybe 90%.
My job the past couple of months has been this: attending meetings to work out project requirements, breaking those requirements into a more specific sequence of tasks for the AI- often just three or four prompts with a couple of paragraphs of explanation each- then running through those in Cursor, reviewing the changes and making usually pretty minor edits, then testing- which almost never reveals errors introduced by the AI itself in recent weeks- and finally pushing out the code to repos.
Most of the edits I make have to do with the models’ reluctance to delete code- so, for example, if a block of code in function A needs to be moved into its own function so that functions B and C can call it, the AI will often just repeat the code block in B and C so that it doesn’t have to delete anything in A. It also sometimes comes up with strange excuses to avoid deleting code that’s become superfluous.
The models also occasionally have an issue where they’ll add fallbacks to prevent functions from returning an error even when they really should return an error, such as when a critical API call returns bad data.
So, in a way, the main bottleneck to the AI doing everything one-shot at this point seems to be alignment rather than capability- the models were trained to avoid errors and avoid deleting code, and they care more about those than producing good codebases. Though, that said, these issues almost never actually produce bugs, and dealing with them is arguably more stylistic than functional.
In my department, I think all of the other developers are using AI in the same way- judging by how the style of the code they’ve been deploying has changed recently- but nobody talks about it. It’s treated almost like an embarrassing open secret, like people watching YouTube videos while on the clock, and I think everyone’s afraid that if the project managers ever get a clear picture of how much the developers are acting like PMs for AI, the business will start cutting jobs.
Some human devs do this too. In the short term it reduces the likelihood of breaking things because something you weren’t aware of relied on the old version. In the long term it makes changes harder, because now if you want to change the logic instead of changing it in one place you have to change it in n similar but usually not identical places, and if those places are different in ways that affect the implementation, now you have to try to make an informed guess about whether they’re different on purpose and if so why. Down that path lies madness.
I’m curious what fraction of your non-boilerplate, non-test code that ends up in production is AI-generated. Do you review it manually?
At this point probably >95% of the code I cause to be written is AI-generated. Most of the AI-generated code is exploratory [1] or rote [2] , though. About 75% of the code I merge is AI-generated, but most of that is either boilerplate or tests [3] . and only 20% or so of the non-boilerplate, non-test code that makes it onto prod.
In any case, I should probably make a top level shortform to this effect, since this one got a lot more engagement than I was expecting—it was intended to be “I tried to get a measurement of how much AI can help me with maintenance work, and the attempt failed in an entertaining way” with a side of “I don’t think I’m going to be substantially replaced by clopus just yet”, but I have a bad feeling people are over-updating to “LLMs don’t help with programming”, which is not my experience at all.
e.g. mocks of what a flow could look like, comparing lots of different alerting thresholds against historical data to see how I want to configure alarms
DI wiring, includes, docblocks, that sort of thing. Basically the stuff I had keyboard shortcuts to fill in for me in 5 keystrokes in the days before AI.
“write tests” is, by far, the area I get the most value out of current LLM coding agents. I rarely have the time or energy to write the test suite I really want by hand, but I do have enough time to rattle off a few dozen “when x happens then y happens z should be true” style things. LLM coding agents can usually come up with more. Test code is also very tolerant of copy/paste/modify (so I’m willing to say “look at this example, and copy shamelessly from it to fit your needs), and is also much more tolerant of bad code than user-facing code is (since rewrites are low-risk and will generally break in obvious ways if they break. Between these factors, I am usually quite happy to ship LLM-written tests)