I don’t think we necessarily disagree, here? I can imagine it being useful in this context, and I nod along at your “no loose leash” and “doesn’t feel like an employee, but a powerful tool” caveats.
What sort of codebase are you working on?
This specifically was an attempt to vibe-code from scratch a note-management app that’s to my tastes; think Obsidian/Logseq, broken down into small modules/steps.
As a simple example, I’ve had a “real project” benchmark for awhile to convert ~2000 lines of test cases from an old framework to a new one
My guess is that “convert this already-written code from this representation/framework/language/factorization to this other one” may be one of the things LLMs are decent at, yep! This is a direction I’m exploring right now: hack the codebase together using some language + factorization in which it’s easy (but perhaps unwise) to write, then try to use LLMs to “compile” it into a better format.
But this isn’t really “vibe-coding”/”describe the spec in natural language and watch the LLM implement it!”/”programming as a job is gone/dramatically transformed!”, the way it’s being advertised. LLMs are not, it seems, actually good at mapping natural-language descriptions into non-hack-y, robust background logic. You need a “code-level” prompt to specify the task precisely enough. And there’s only one way to bring that code into existence if LLMs can’t do it for you.
I agree with you that “Opus 4.5 can do anything” is overselling it and there is too much hype around acting like these things are fully autonomous software architects. I did want to note though that Opus 4.5 is a vast improvement and praise is warranted.
My guess is that “convert this already-written code from this representation/framework/language/factorization to this other one” may be one of the things LLMs are decent at, yep!
Agreed, I’m relying on their “localized” intelligence to get work done fast. Where Anthropic has improved their models significantly this year is A) improving task “planning”, e.g. how to extract the relevant context needed to make decisions LLMs broadly already could do, B) editing code in sane ways that doesn’t break things (at the beginning of the year, Claude would chew up any 4000+ LOC file just from wrong tool use). In some ways, this isn’t necessarily higher “intelligence” (Claude models remain relatively dumber on solving novel problems compared to frontier GPT/Gemini) but proper training in the coding domain.
But this isn’t really “vibe-coding”/”describe the spec in natural language and watch the LLM implement it!”/”programming as a job is gone/dramatically transformed!”, the way it’s being advertised. LLMs are not, it seems, actually good at mapping natural-language descriptions into non-hack-y, robust background logic. You need a “code-level” prompt to specify the task precisely enough
It’s a mixed bag. In practice, I can vibe code 100 line isolated modules from natural language, though it does require inspecting the code for bugs and then providing the model feedback and it fixes things. Still much faster than hand writing and slightly faster than “intention” auto-complete with Cursor.
But overall, yes, I agree that I continue to do all the systems architecture and it feels like I’m offloading more well defined tasks to the model.
I don’t think we necessarily disagree, here? I can imagine it being useful in this context, and I nod along at your “no loose leash” and “doesn’t feel like an employee, but a powerful tool” caveats.
This specifically was an attempt to vibe-code from scratch a note-management app that’s to my tastes; think Obsidian/Logseq, broken down into small modules/steps.
My guess is that “convert this already-written code from this representation/framework/language/factorization to this other one” may be one of the things LLMs are decent at, yep! This is a direction I’m exploring right now: hack the codebase together using some language + factorization in which it’s easy (but perhaps unwise) to write, then try to use LLMs to “compile” it into a better format.
But this isn’t really “vibe-coding”/”describe the spec in natural language and watch the LLM implement it!”/”programming as a job is gone/dramatically transformed!”, the way it’s being advertised. LLMs are not, it seems, actually good at mapping natural-language descriptions into non-hack-y, robust background logic. You need a “code-level” prompt to specify the task precisely enough. And there’s only one way to bring that code into existence if LLMs can’t do it for you.
I agree with you that “Opus 4.5 can do anything” is overselling it and there is too much hype around acting like these things are fully autonomous software architects. I did want to note though that Opus 4.5 is a vast improvement and praise is warranted.
Agreed, I’m relying on their “localized” intelligence to get work done fast. Where Anthropic has improved their models significantly this year is A) improving task “planning”, e.g. how to extract the relevant context needed to make decisions LLMs broadly already could do, B) editing code in sane ways that doesn’t break things (at the beginning of the year, Claude would chew up any 4000+ LOC file just from wrong tool use). In some ways, this isn’t necessarily higher “intelligence” (Claude models remain relatively dumber on solving novel problems compared to frontier GPT/Gemini) but proper training in the coding domain.
It’s a mixed bag. In practice, I can vibe code 100 line isolated modules from natural language, though it does require inspecting the code for bugs and then providing the model feedback and it fixes things. Still much faster than hand writing and slightly faster than “intention” auto-complete with Cursor.
But overall, yes, I agree that I continue to do all the systems architecture and it feels like I’m offloading more well defined tasks to the model.