I get the feeling that it can remove blockers. Your effectiveness (however you want to define that) at programming is like Liebig’s barrel, or maybe a better example would be a road with a bunch of big rocks and rubble on it—you have to keep removing things to proceed. The more experienced you are, the better at removing obstacles, or even better, avoiding them. LLMs are good at giving you more options for how to approach problems. Often their suggestion is not the best, but it’s better than nothing and will suffice to move on.
They do like making complex stuff, though… I wonder if that’s because there’s a lot more bad code that looks good than just good code?
Yeah, they can be used as a way to get out of procrastination/breach ugh fields/move past the generalized writer’s block, by giving you the ability to quickly and non-effortfully hammer out a badly-donefirst draft of the thing.
They do like making complex stuff, though… I wonder if that’s because there’s a lot more bad code that looks good than just good code?
Wild guess: it’s because AGI labs (and Anthropic in particular) are currently trying to train them to be able to autonomously spin up large, complex codebases, so much of their post-training consists of training episodes where they’re required to do that. So they end up biased towards architectural choices suitable for complex applications instead of small ones.
Like, given free reign, they seem eager to build broad foundations on which lots of other features could later be implemented (n = 2), even if you gave them the full spec and it only involves 1-3 features.
This would actually be kind of good – building a good foundation which won’t need refactoring if you’ll end up fancying a new feature! – except they’re, uh, not actually good at building those foundations.
Which, if my guess is correct, is because their post-training always pushes them to the edge of, and then slightly past, their abilities. Like, suppose the post-training uses rejection sampling (training on episodes in which they succeeded), and it involves a curriculum spanning everything from small apps to “build an OS”. If so, much like in METR’s studies, there’ll be some point of program complexity where they only succeed 50% of the time (or less). But they’re still going to be trained on those 50% successful trajectories. So they’ll end up “overambitious”/”overconfident”, trying to do things they’re not quite reliable at yet.
Maybe the sufficient reason is that LLMs learn both on the bad code and the good code. And on the old code, which was written before some powerful features were introduced to the language or the libraries, so it doesn’t use them. Also, they learn on automatically generated code, if someone commits that to the repository.
I get the feeling that it can remove blockers. Your effectiveness (however you want to define that) at programming is like Liebig’s barrel, or maybe a better example would be a road with a bunch of big rocks and rubble on it—you have to keep removing things to proceed. The more experienced you are, the better at removing obstacles, or even better, avoiding them. LLMs are good at giving you more options for how to approach problems. Often their suggestion is not the best, but it’s better than nothing and will suffice to move on.
They do like making complex stuff, though… I wonder if that’s because there’s a lot more bad code that looks good than just good code?
Yeah, they can be used as a way to get out of procrastination/breach ugh fields/move past the generalized writer’s block, by giving you the ability to quickly and non-effortfully hammer out a badly-done first draft of the thing.
Wild guess: it’s because AGI labs (and Anthropic in particular) are currently trying to train them to be able to autonomously spin up large, complex codebases, so much of their post-training consists of training episodes where they’re required to do that. So they end up biased towards architectural choices suitable for complex applications instead of small ones.
Like, given free reign, they seem eager to build broad foundations on which lots of other features could later be implemented (n = 2), even if you gave them the full spec and it only involves 1-3 features.
This would actually be kind of good – building a good foundation which won’t need refactoring if you’ll end up fancying a new feature! – except they’re, uh, not actually good at building those foundations.
Which, if my guess is correct, is because their post-training always pushes them to the edge of, and then slightly past, their abilities. Like, suppose the post-training uses rejection sampling (training on episodes in which they succeeded), and it involves a curriculum spanning everything from small apps to “build an OS”. If so, much like in METR’s studies, there’ll be some point of program complexity where they only succeed 50% of the time (or less). But they’re still going to be trained on those 50% successful trajectories. So they’ll end up “overambitious”/”overconfident”, trying to do things they’re not quite reliable at yet.
Or maybe not; wild guess, as I said.
Maybe the sufficient reason is that LLMs learn both on the bad code and the good code. And on the old code, which was written before some powerful features were introduced to the language or the libraries, so it doesn’t use them. Also, they learn on automatically generated code, if someone commits that to the repository.