I’ve previously left a comment describing my being fairly unimpressed with Claude Code Opus 4.5, and leaving a few guesses regarding what causes the difference in people’s opinion regarding its usefulness. Eight more days into it, I have new comments and new guesses.
tl;dr: Very useful (or maybe I’m also deluding myself), very hard to use (or maybe I have skill issues). If you want to build something genuinely complicated with it, it’s probably worth it, but it will be an uphill battle against superhuman-speed codebase rot, and you will need significant technical expertise and/or the ability to learn that expertise quickly.
First, what I’m attempting is to use it to implement an app that’s not really all that complicated, but which is still pretty involved, still runs fairly nontrivial logic at the backend (along the lines of this comment), and about whose functionality I have precise desiderata and very few vibes.
Is Opus 4.5 helpful and significantly speeding me up? Yes, of that there is no doubt. Asking it questions about the packages to use, what tools they offer, and how I could architecture solutions to various problems I run into, is incredibly helpful. Its answers are so much more precise than Google’s, it distill information so much better than raw code documentation does, and it’s both orders of magnitude faster than StackOverflow and is smarter than the median answer you’d get there.
Is Claude Code helpful and speeding me up? That is more of an open question. Some loose thoughts:
Once I and Opus 4.5 have nailed down the architecture, you have to actually put it into code. If I’ve made the correct call regarding the scope, Claude Code can be mostly trusted with implementation as well.
What is “the correct call regarding the scope”? I’m still gauging this, but some pointers:
You can trust it to implement an algorithm that traverses a complicated but well-specified data structure and transforms it in some way.
You can trust it to implement the infrastructure for trivial data roundtrips through nontrivial architectures (e. g., taking a user action and propagating it across several nested concern boundaries).
You cannot trust it to do (1) and (2) together: to implement the data-roundtrip infrastructure if it requires implementing, rather than only navigating, some complicated logic.
Which is to say: the scope has to be pretty small. Smaller than you’d think from all the hype. No, even smaller than that.
At which point: do I actually get speed-ups from delegating the implementation to Claude, as opposed to interrogating the chatbot version of Opus 4.5 regarding how it can be implemented, then doing it manually (or with Cursor’s sometimes-helpful sometimes-frustrating autocomplete assist?).
On one hand, this does compromise learning. I don’t develop as precise a sense of how specific tools can be used/built as I would have if I were forced to think through the lower-level details manually. This is potentially bad in the very-long term.
On the other hand, this removes a fair amount of mental strain and frustration from me. The alternative to my babbling a feature to Claude may not be “I interrogate Opus 4.5 and write the code manually”, it may be “I go watch a movie, because I don’t have it in me this late in the day”. And, really, would it really be helpful for me to learn the minutia of how this one-off Python package I’ll never use again works? (Perhaps!)
Which is to say...
Over the short term, and for short-term projects, this definitely provides a speed-up.
Over the medium term, e. g. 4 weeks, or projects of similar scope? Dunno.
Over the long term, e. g. months/years, or projects of similar scope? Dunno.
Regarding bug-fixing:
It’s often advised to use test-driven development with LLMs, because it keeps them honest and is so easy. Just ask the LLM to write up the tests beforehand, glance over them to make sure they’re solid, and off you go! This is not good advice, in my experience. Waste of time and tokens. LLMs don’t write good tests, fixing those up would take longer than either doing the thing manually or having the LLM generate the feature raw then working out the kinks post-factum, etc.
(You can try asking the LLM to code up some domain-specific interface for you instead, so that you get a firehose of data on what you’re coding and on any bugs that transpire… But honestly, I wish someone would just vibe-code some non-terrible general-purpose call-graph debugging tools instead. Maybe I should try doing it myself.)
If the LLM-generated code has bugs in it, can you make them go away by describing the bug to the LLM and asking it to fix it? Sometimes. Often no, and often it takes longer than going looking for the bug directly.
A broad issue with LLMs is that their activity in the codebase is always… “entropy-increasing”. Almost every time they take action, they nudge some stuff out of place, add unnecessary features/functions, unnecessary comments, unnecessary guardrails for states the program can’t enter, or they implement something then change their mind but fail to clean the unused thing back up, etc. This happens even if you ask the LLM to go tighten things up/see if the codebase can be streamlined. This also means that if an LLM doesn’t one-shot the bug, if it tries A then B then C, that’s going to leave a trail of failed bug-fixes which may or may not be screwing something else up.
So if you want to build something nontrivially complex which can serve as a foundation for further complexity, you have to periodically go in and manually straighten things out.
(Sidenote: I did not know about code-simplifier prior to reading the OP. I’ll try it, I guess.)
The LLM merely lacking context is a significant factor sometimes:
It’s not really possible to convey your full intent to it, both because language is too low-bandwidth and because of context-window limitations.
It’s often unable to bug-fix efficiently just because it can’t see what’s going wrong. (I guess I can screenshot things, but I know Claudes don’t have great vision.)
What’s up with people who run twelve hundred simultaneous Claude sessions? Are they all LARPers?
Maybe.
Maybe not? After I’ve chewed up the project into bite-sized pieces, I do end up having to wait a while on Claude while it executes. If I had a better handle on what Claude can handle, and how to babysit it, and had the mental habit of breaking projects up into parallelizable instead of sequential implementation steps… If I’ve spent more than a week figuring this out… I can maybe see it?
Is it possible to “vibe-code”?
Not really, no.
The moment you lose track of what’s going on and start relying on trust and “vibes”, the moment you make the wrong call about the scope, it’s over for you, Claude runs off and spills a bunch of spaghetti all over the codebase. If that happens, you shouldn’t even try to fix it, just roll it back, chew the task up into finer pieces, and redo.
More broadly, if you don’t have the technical mindset to break the project down into small pieces that could be implemented and tested independently, and you can’t quickly get up to speed if you run into some novel technical/architectural problem that requires a thought-out technical solution, I don’t think you’re getting anywhere, no.
(You can attain the needed competence more rapidly than ever before via asking Opus 4.5 as well, but, like, you do need to attain that competence, at which point are you really “vibe-coding”?)
What’s up with the people who claim they can vibe-code, that their workflows look like this picture where they sit back and relax and Claude does everything? My guesses in increasing order of respectability:
They’re doing fake work, or are straight-up just attention-seeking AI influencers.
They’re deluding themselves and their projects are going to crash and burn.
They’re not doing anything actually nontrivially complicated, for certain values of “nontrivially complicated”. See the footnote in my previous comment.
See also this as a concrete example. Sure, it sounds impressive when the guy frames it as impressive, but I’m also getting the sense it’s just plugging a bunch of APIs into each other.
They’re drastically underestimating how much metis/know-how goes into their workflows, how much they really babysit LLMs, and how very far LLMs are from real autonomy.
Is Claude Code on track to replace software engineers? Ha! No. It does not have a “4-hour time horizon”. 20 minutes, maybe. Whatever METR’s graph is measuring, I currently don’t feel very daunted by the prospect of it continuing to go exponential. I may eat my words here, sure.
Is Rust a perfect language for agents? Haha, good one. I tried getting Opus to write Rust code (on its own suggestion), it was not good at it, like 10x slower and 3x buggier. I went back to Python. I’ll try checking later if it’s any good at translating from Python to Rust, once the Python code already exists… (And for fairness’ sake, I suppose I was using Opus, and he was perhaps talking about Codex.)
Is it fun to use Claude Code to code? No. Not for me. Keeping the project from degenerating into a non-functional mess is a constant uphill battle. I find it mentally taxing, in a different way to how writing code manually is mentally taxing.
To sum my current view up: Seems useful, but hard to use. You’ll have to fight it/the decay it spreads in its wake every step of the way, and making a misstep will give your codebase lethal cancer.
We’ll see how I feel about it in one more week, I suppose.
My impression is that your view is very rare on social media, but it almost completely agrees with mine so far (two exceptions are: fixing up 600 lines of tests that it wrote for me didn’t took nearly as long as it would have taken for me to write the tests myself & i’ve no strong opinions on speed/ceiling of future progress), so I’m curious how it will evolve. If it’s not too big of an ask, pleass reply to this comment when you have a new comment on the topic.
This is very similar to my current experience. Perhaps I’m holding it wrong, but when I try to use Claude Code, I find that it can usually implement a feature in a way that’s as correct and efficient as I would write it, but it almost never implements it such that I am satisfied with the simplicity or organization. Typically I rewrite what it did and cut the LOC in half.
I am interested in trying out the new code simplifier to see whether it can do a good job. I have been asking Claude something like “while I test it, can you read over what you wrote and think about whether anything could be better or simpler?” and that catches a non-zero amount of issues but not the kind of substantial simplifications that it’s missing.
I am interested in trying out the new code simplifier to see whether it can do a good job
Tried it out a couple times just now, it appears specialized for low-level, syntax-level rephrasings. It will inline functions and intermediate-variable computations that are only used once and try to distill if-else blocks into something more elegant, but it won’t even attempt doing things at a higher level. Was very eager to remove Claude’s own overly verbose/obvious comments, though. Very relatable.
Overall, it would be mildly useful in isolation, but I’m pretty sure you can get the same job done ten times faster using Haiku 4.5 or Composer-1 (Cursor’s own blazing-fast LLM).
I’ve previously left a comment describing my being fairly unimpressed with Claude Code Opus 4.5, and leaving a few guesses regarding what causes the difference in people’s opinion regarding its usefulness. Eight more days into it, I have new comments and new guesses.
tl;dr: Very useful (or maybe I’m also deluding myself), very hard to use (or maybe I have skill issues). If you want to build something genuinely complicated with it, it’s probably worth it, but it will be an uphill battle against superhuman-speed codebase rot, and you will need significant technical expertise and/or the ability to learn that expertise quickly.
First, what I’m attempting is to use it to implement an app that’s not really all that complicated, but which is still pretty involved, still runs fairly nontrivial logic at the backend (along the lines of this comment), and about whose functionality I have precise desiderata and very few vibes.
Is Opus 4.5 helpful and significantly speeding me up? Yes, of that there is no doubt. Asking it questions about the packages to use, what tools they offer, and how I could architecture solutions to various problems I run into, is incredibly helpful. Its answers are so much more precise than Google’s, it distill information so much better than raw code documentation does, and it’s both orders of magnitude faster than StackOverflow and is smarter than the median answer you’d get there.
Is Claude Code helpful and speeding me up? That is more of an open question. Some loose thoughts:
Once I and Opus 4.5 have nailed down the architecture, you have to actually put it into code. If I’ve made the correct call regarding the scope, Claude Code can be mostly trusted with implementation as well.
What is “the correct call regarding the scope”? I’m still gauging this, but some pointers:
You can trust it to implement an algorithm that traverses a complicated but well-specified data structure and transforms it in some way.
You can trust it to implement the infrastructure for trivial data roundtrips through nontrivial architectures (e. g., taking a user action and propagating it across several nested concern boundaries).
You cannot trust it to do (1) and (2) together: to implement the data-roundtrip infrastructure if it requires implementing, rather than only navigating, some complicated logic.
Which is to say: the scope has to be pretty small. Smaller than you’d think from all the hype. No, even smaller than that.
At which point: do I actually get speed-ups from delegating the implementation to Claude, as opposed to interrogating the chatbot version of Opus 4.5 regarding how it can be implemented, then doing it manually (or with Cursor’s sometimes-helpful sometimes-frustrating autocomplete assist?).
On one hand, this does compromise learning. I don’t develop as precise a sense of how specific tools can be used/built as I would have if I were forced to think through the lower-level details manually. This is potentially bad in the very-long term.
On the other hand, this removes a fair amount of mental strain and frustration from me. The alternative to my babbling a feature to Claude may not be “I interrogate Opus 4.5 and write the code manually”, it may be “I go watch a movie, because I don’t have it in me this late in the day”. And, really, would it really be helpful for me to learn the minutia of how this one-off Python package I’ll never use again works? (Perhaps!)
Which is to say...
Over the short term, and for short-term projects, this definitely provides a speed-up.
Over the medium term, e. g. 4 weeks, or projects of similar scope? Dunno.
Over the long term, e. g. months/years, or projects of similar scope? Dunno.
Regarding bug-fixing:
It’s often advised to use test-driven development with LLMs, because it keeps them honest and is so easy. Just ask the LLM to write up the tests beforehand, glance over them to make sure they’re solid, and off you go! This is not good advice, in my experience. Waste of time and tokens. LLMs don’t write good tests, fixing those up would take longer than either doing the thing manually or having the LLM generate the feature raw then working out the kinks post-factum, etc.
(You can try asking the LLM to code up some domain-specific interface for you instead, so that you get a firehose of data on what you’re coding and on any bugs that transpire… But honestly, I wish someone would just vibe-code some non-terrible general-purpose call-graph debugging tools instead. Maybe I should try doing it myself.)
If the LLM-generated code has bugs in it, can you make them go away by describing the bug to the LLM and asking it to fix it? Sometimes. Often no, and often it takes longer than going looking for the bug directly.
A broad issue with LLMs is that their activity in the codebase is always… “entropy-increasing”. Almost every time they take action, they nudge some stuff out of place, add unnecessary features/functions, unnecessary comments, unnecessary guardrails for states the program can’t enter, or they implement something then change their mind but fail to clean the unused thing back up, etc. This happens even if you ask the LLM to go tighten things up/see if the codebase can be streamlined. This also means that if an LLM doesn’t one-shot the bug, if it tries A then B then C, that’s going to leave a trail of failed bug-fixes which may or may not be screwing something else up.
So if you want to build something nontrivially complex which can serve as a foundation for further complexity, you have to periodically go in and manually straighten things out.
(Sidenote: I did not know about code-simplifier prior to reading the OP. I’ll try it, I guess.)
The LLM merely lacking context is a significant factor sometimes:
It’s not really possible to convey your full intent to it, both because language is too low-bandwidth and because of context-window limitations.
It’s often unable to bug-fix efficiently just because it can’t see what’s going wrong. (I guess I can screenshot things, but I know Claudes don’t have great vision.)
What’s up with people who run twelve hundred simultaneous Claude sessions? Are they all LARPers?
Maybe.
Maybe not? After I’ve chewed up the project into bite-sized pieces, I do end up having to wait a while on Claude while it executes. If I had a better handle on what Claude can handle, and how to babysit it, and had the mental habit of breaking projects up into parallelizable instead of sequential implementation steps… If I’ve spent more than a week figuring this out… I can maybe see it?
Is it possible to “vibe-code”?
Not really, no.
The moment you lose track of what’s going on and start relying on trust and “vibes”, the moment you make the wrong call about the scope, it’s over for you, Claude runs off and spills a bunch of spaghetti all over the codebase. If that happens, you shouldn’t even try to fix it, just roll it back, chew the task up into finer pieces, and redo.
More broadly, if you don’t have the technical mindset to break the project down into small pieces that could be implemented and tested independently, and you can’t quickly get up to speed if you run into some novel technical/architectural problem that requires a thought-out technical solution, I don’t think you’re getting anywhere, no.
(You can attain the needed competence more rapidly than ever before via asking Opus 4.5 as well, but, like, you do need to attain that competence, at which point are you really “vibe-coding”?)
What’s up with the people who claim they can vibe-code, that their workflows look like this picture where they sit back and relax and Claude does everything? My guesses in increasing order of respectability:
They’re doing fake work, or are straight-up just attention-seeking AI influencers.
They’re deluding themselves and their projects are going to crash and burn.
They’re not doing anything actually nontrivially complicated, for certain values of “nontrivially complicated”. See the footnote in my previous comment.
See also this as a concrete example. Sure, it sounds impressive when the guy frames it as impressive, but I’m also getting the sense it’s just plugging a bunch of APIs into each other.
They’re drastically underestimating how much metis/know-how goes into their workflows, how much they really babysit LLMs, and how very far LLMs are from real autonomy.
Is Claude Code on track to replace software engineers? Ha! No. It does not have a “4-hour time horizon”. 20 minutes, maybe. Whatever METR’s graph is measuring, I currently don’t feel very daunted by the prospect of it continuing to go exponential. I may eat my words here, sure.
Is Rust a perfect language for agents? Haha, good one. I tried getting Opus to write Rust code (on its own suggestion), it was not good at it, like 10x slower and 3x buggier. I went back to Python. I’ll try checking later if it’s any good at translating from Python to Rust, once the Python code already exists… (And for fairness’ sake, I suppose I was using Opus, and he was perhaps talking about Codex.)
Is it fun to use Claude Code to code? No. Not for me. Keeping the project from degenerating into a non-functional mess is a constant uphill battle. I find it mentally taxing, in a different way to how writing code manually is mentally taxing.
To sum my current view up: Seems useful, but hard to use. You’ll have to fight it/the decay it spreads in its wake every step of the way, and making a misstep will give your codebase lethal cancer.
We’ll see how I feel about it in one more week, I suppose.
My impression is that your view is very rare on social media, but it almost completely agrees with mine so far (two exceptions are: fixing up 600 lines of tests that it wrote for me didn’t took nearly as long as it would have taken for me to write the tests myself & i’ve no strong opinions on speed/ceiling of future progress), so I’m curious how it will evolve. If it’s not too big of an ask, pleass reply to this comment when you have a new comment on the topic.
An update here.
This is very similar to my current experience. Perhaps I’m holding it wrong, but when I try to use Claude Code, I find that it can usually implement a feature in a way that’s as correct and efficient as I would write it, but it almost never implements it such that I am satisfied with the simplicity or organization. Typically I rewrite what it did and cut the LOC in half.
I am interested in trying out the new code simplifier to see whether it can do a good job. I have been asking Claude something like “while I test it, can you read over what you wrote and think about whether anything could be better or simpler?” and that catches a non-zero amount of issues but not the kind of substantial simplifications that it’s missing.
Tried it out a couple times just now, it appears specialized for low-level, syntax-level rephrasings. It will inline functions and intermediate-variable computations that are only used once and try to distill if-else blocks into something more elegant, but it won’t even attempt doing things at a higher level. Was very eager to remove Claude’s own overly verbose/obvious comments, though. Very relatable.
Overall, it would be mildly useful in isolation, but I’m pretty sure you can get the same job done ten times faster using Haiku 4.5 or Composer-1 (Cursor’s own blazing-fast LLM).
Curious if you get a different experience.