Writing tests (in Python). Writing comprehensive tests for my code used to take a significant portion of my time. Probably at least 2x more than writing the actual code, and subjectively a lot more. Now it’s a matter of “please write tests for this function”, “now this one” etc., with an extra “no, that’s ugly, make it nicer” every now and then.
Working with simple code is also a lot faster, as long as it doesn’t have to process too much. So most of what I do now is make sure the file it’s processing isn’t more than ~500 lines of code. This has the nice side effect of forcing me to make sure the code is in small, logical chunks. Cursor can often handle most of what I want, after which I tidy up and make things decent. I’d estimate this make me at least 40% faster at basic coding, probably a lot more. Cursor can in general handle quite large projects if you manage it properly. E.g. last week it took me around 3 days to make a medium sized project with ~14k lines of Python code. This included Docker setup stuff (not hard, but fiddly), a server + UI (the frontend is rubbish, but that’s fine), and some quite complicated calculations. Without LLMs this would have taken at least a week, and probably a month.
Debugging data dumps is now a lot easier. I ask Claude to make me throwaway html pages to display various stuff. Ditto for finding anomalies. It won’t find everything, but can find a lot. All of this can be done with the appropriate tooling, of course, but that requires knowing about it, having it set up and knowing how to use it.
Glue code or in general interacting with external APIs (or often also internal ones) is a lot easier, until it’s not. You can often one-shot a workable solution that does exactly what you want, which you can then just modify to not be ugly.
I’m not sure how more productive I am with LLMs. But that’s mainly because coding is not all I do. If I was just given a set of things to make and was allowed to crank away at it, then I’m pretty sure I’d be 5-10x faster than two years ago.
thanks for concrete examples, can you help me understand how these translate from individual productivity to externally-observable productivity?
3 days to make a medium sized project
I agree Docker setup can be fiddly, however what happened with the 50+% savings—did you lower price for the customer to stay competitive, do you do 2x as many paid projects now, or did you postpone hiring another developer who is not needed now, or do you just have more free time? No change in support&maintenance costs compared to similar projects before LLMs?
processing isn’t more than ~500 lines of code
oh well, my only paid experience is with multi-year project development&maintenance, those are definitelly not in the category under 1kloc 🙈 which might help to explain my abysmal experience trying to use any AI tools for work (beyond autocomplete, but IntelliSense also existed before LLMs)
TBH, I am now moving towards the opinion that evals are very un-representative of the “real world” (if we exclude LLM wrappers as requested in the OP … though LLM wrappers including evals are becoming part of the “real world” too, so I don’t know—it’s like banking bootstrapped wealthy bankers, and LLM wrappers might be bootstraping wealthy LLM startups)
I can do more projects in parallel than I could have before. Which means that I have even more work now… The support and maintenance costs of the code itself are the same, as long as you maintain constant vigilance to make sure nothing bad gets merged. So the costs are moved from development to review. It’s a lot easier to produce thousands of lines of slop which then have to be reviewed and loads of suggestions made. It’s easy for bad taste to be amplified, which is a real cost that might not be noticed that much.
There are some evals which work on large codebases (e.g. “fix this bug in django”), but those are the minority, granted. They can help with the scaffolding, though—those tend to be large projects in which a Claude can help find things.
But yeah, large files are ok if you just want to find something, but somewhere under 500 loc seems to be the limit of what will work well. Though you can get round it somewhat by copying the parts to be changed to a different file then copying them back, or other hacks like that...
Writing tests (in Python). Writing comprehensive tests for my code used to take a significant portion of my time. Probably at least 2x more than writing the actual code, and subjectively a lot more. Now it’s a matter of “please write tests for this function”, “now this one” etc., with an extra “no, that’s ugly, make it nicer” every now and then.
Working with simple code is also a lot faster, as long as it doesn’t have to process too much. So most of what I do now is make sure the file it’s processing isn’t more than ~500 lines of code. This has the nice side effect of forcing me to make sure the code is in small, logical chunks. Cursor can often handle most of what I want, after which I tidy up and make things decent. I’d estimate this make me at least 40% faster at basic coding, probably a lot more. Cursor can in general handle quite large projects if you manage it properly. E.g. last week it took me around 3 days to make a medium sized project with ~14k lines of Python code. This included Docker setup stuff (not hard, but fiddly), a server + UI (the frontend is rubbish, but that’s fine), and some quite complicated calculations. Without LLMs this would have taken at least a week, and probably a month.
Debugging data dumps is now a lot easier. I ask Claude to make me throwaway html pages to display various stuff. Ditto for finding anomalies. It won’t find everything, but can find a lot. All of this can be done with the appropriate tooling, of course, but that requires knowing about it, having it set up and knowing how to use it.
Glue code or in general interacting with external APIs (or often also internal ones) is a lot easier, until it’s not. You can often one-shot a workable solution that does exactly what you want, which you can then just modify to not be ugly.
I’m not sure how more productive I am with LLMs. But that’s mainly because coding is not all I do. If I was just given a set of things to make and was allowed to crank away at it, then I’m pretty sure I’d be 5-10x faster than two years ago.
thanks for concrete examples, can you help me understand how these translate from individual productivity to externally-observable productivity?
I agree Docker setup can be fiddly, however what happened with the 50+% savings—did you lower price for the customer to stay competitive, do you do 2x as many paid projects now, or did you postpone hiring another developer who is not needed now, or do you just have more free time? No change in support&maintenance costs compared to similar projects before LLMs?
oh well, my only paid experience is with multi-year project development&maintenance, those are definitelly not in the category under 1kloc 🙈 which might help to explain my abysmal experience trying to use any AI tools for work (beyond autocomplete, but IntelliSense also existed before LLMs)
TBH, I am now moving towards the opinion that evals are very un-representative of the “real world” (if we exclude LLM wrappers as requested in the OP … though LLM wrappers including evals are becoming part of the “real world” too, so I don’t know—it’s like banking bootstrapped wealthy bankers, and LLM wrappers might be bootstraping wealthy LLM startups)
I can do more projects in parallel than I could have before. Which means that I have even more work now… The support and maintenance costs of the code itself are the same, as long as you maintain constant vigilance to make sure nothing bad gets merged. So the costs are moved from development to review. It’s a lot easier to produce thousands of lines of slop which then have to be reviewed and loads of suggestions made. It’s easy for bad taste to be amplified, which is a real cost that might not be noticed that much.
There are some evals which work on large codebases (e.g. “fix this bug in django”), but those are the minority, granted. They can help with the scaffolding, though—those tend to be large projects in which a Claude can help find things.
But yeah, large files are ok if you just want to find something, but somewhere under 500 loc seems to be the limit of what will work well. Though you can get round it somewhat by copying the parts to be changed to a different file then copying them back, or other hacks like that...