I use Claude for some hobby coding at home. I find it very useful in the following situations:
I am too lazy to study the API, just give me the code that loads the image from the file or whatever
how would you do this? suggest a few alternatives (often its preferred one is not the one I choose)
tell me more about this topic
So it saves me a lot of time reading the documentation, and sometimes it suggests a cool trick I probably wouldn’t find otherwise. (For example, in Python list comprehension, you can create a “local variable” by iterating through a list containing one item.)
On the other hand, if I let Claude write code, it is often much longer code that when I think about the problem, and only let Claude give me short pieces of code.
Maybe I am using it the wrong way. But that too is a kind of risk: having Claude conveniently write the code for me encourages me to produce a lot of code because it is cheap. But as Dijkstra famously said, it is not “lines produced” but “lines spent”.
I get the feeling that it can remove blockers. Your effectiveness (however you want to define that) at programming is like Liebig’s barrel, or maybe a better example would be a road with a bunch of big rocks and rubble on it—you have to keep removing things to proceed. The more experienced you are, the better at removing obstacles, or even better, avoiding them. LLMs are good at giving you more options for how to approach problems. Often their suggestion is not the best, but it’s better than nothing and will suffice to move on.
They do like making complex stuff, though… I wonder if that’s because there’s a lot more bad code that looks good than just good code?
Yeah, they can be used as a way to get out of procrastination/breach ugh fields/move past the generalized writer’s block, by giving you the ability to quickly and non-effortfully hammer out a badly-donefirst draft of the thing.
They do like making complex stuff, though… I wonder if that’s because there’s a lot more bad code that looks good than just good code?
Wild guess: it’s because AGI labs (and Anthropic in particular) are currently trying to train them to be able to autonomously spin up large, complex codebases, so much of their post-training consists of training episodes where they’re required to do that. So they end up biased towards architectural choices suitable for complex applications instead of small ones.
Like, given free reign, they seem eager to build broad foundations on which lots of other features could later be implemented (n = 2), even if you gave them the full spec and it only involves 1-3 features.
This would actually be kind of good – building a good foundation which won’t need refactoring if you’ll end up fancying a new feature! – except they’re, uh, not actually good at building those foundations.
Which, if my guess is correct, is because their post-training always pushes them to the edge of, and then slightly past, their abilities. Like, suppose the post-training uses rejection sampling (training on episodes in which they succeeded), and it involves a curriculum spanning everything from small apps to “build an OS”. If so, much like in METR’s studies, there’ll be some point of program complexity where they only succeed 50% of the time (or less). But they’re still going to be trained on those 50% successful trajectories. So they’ll end up “overambitious”/”overconfident”, trying to do things they’re not quite reliable at yet.
Maybe the sufficient reason is that LLMs learn both on the bad code and the good code. And on the old code, which was written before some powerful features were introduced to the language or the libraries, so it doesn’t use them. Also, they learn on automatically generated code, if someone commits that to the repository.
Yup, it certainly seems useful if you (a) want to cobble something together using tools you don’t know (and which is either very simple, or easy to “visually” bug-test, or which you don’t need to work extremely reliably), (b) want to learn some tools you don’t know, so you use the LLM as a chat-with-docs interface.
I use Claude for some hobby coding at home. I find it very useful in the following situations:
I am too lazy to study the API, just give me the code that loads the image from the file or whatever
how would you do this? suggest a few alternatives (often its preferred one is not the one I choose)
tell me more about this topic
So it saves me a lot of time reading the documentation, and sometimes it suggests a cool trick I probably wouldn’t find otherwise. (For example, in Python list comprehension, you can create a “local variable” by iterating through a list containing one item.)
On the other hand, if I let Claude write code, it is often much longer code that when I think about the problem, and only let Claude give me short pieces of code.
Maybe I am using it the wrong way. But that too is a kind of risk: having Claude conveniently write the code for me encourages me to produce a lot of code because it is cheap. But as Dijkstra famously said, it is not “lines produced” but “lines spent”.
I get the feeling that it can remove blockers. Your effectiveness (however you want to define that) at programming is like Liebig’s barrel, or maybe a better example would be a road with a bunch of big rocks and rubble on it—you have to keep removing things to proceed. The more experienced you are, the better at removing obstacles, or even better, avoiding them. LLMs are good at giving you more options for how to approach problems. Often their suggestion is not the best, but it’s better than nothing and will suffice to move on.
They do like making complex stuff, though… I wonder if that’s because there’s a lot more bad code that looks good than just good code?
Yeah, they can be used as a way to get out of procrastination/breach ugh fields/move past the generalized writer’s block, by giving you the ability to quickly and non-effortfully hammer out a badly-done first draft of the thing.
Wild guess: it’s because AGI labs (and Anthropic in particular) are currently trying to train them to be able to autonomously spin up large, complex codebases, so much of their post-training consists of training episodes where they’re required to do that. So they end up biased towards architectural choices suitable for complex applications instead of small ones.
Like, given free reign, they seem eager to build broad foundations on which lots of other features could later be implemented (n = 2), even if you gave them the full spec and it only involves 1-3 features.
This would actually be kind of good – building a good foundation which won’t need refactoring if you’ll end up fancying a new feature! – except they’re, uh, not actually good at building those foundations.
Which, if my guess is correct, is because their post-training always pushes them to the edge of, and then slightly past, their abilities. Like, suppose the post-training uses rejection sampling (training on episodes in which they succeeded), and it involves a curriculum spanning everything from small apps to “build an OS”. If so, much like in METR’s studies, there’ll be some point of program complexity where they only succeed 50% of the time (or less). But they’re still going to be trained on those 50% successful trajectories. So they’ll end up “overambitious”/”overconfident”, trying to do things they’re not quite reliable at yet.
Or maybe not; wild guess, as I said.
Maybe the sufficient reason is that LLMs learn both on the bad code and the good code. And on the old code, which was written before some powerful features were introduced to the language or the libraries, so it doesn’t use them. Also, they learn on automatically generated code, if someone commits that to the repository.
Yup, it certainly seems useful if you (a) want to cobble something together using tools you don’t know (and which is either very simple, or easy to “visually” bug-test, or which you don’t need to work extremely reliably), (b) want to learn some tools you don’t know, so you use the LLM as a chat-with-docs interface.