I recently switched from slogging it out the old-fashioned way to using Opus 4.6 (high, ~1M context window) as the primary interface to interact with my code base. It started off well, and was generally a massive relief from being stuck doing the low level wiring. I am much more comfortable operating at an architectural/team lead scope .
The first few weeks it was slamming out requirements at a rate which was probably 5x my personal execution speed. Automated tests, best practice documentation, the full gambit. Then, slowly, the addition of new requirements started breaking previous ones. It became structurally incapable of updating the code-base to meet a new requirement without breaking an adjacent feature.
I would constantly push it to validate directly according to prescribed accepted tests and it would consistently affirm said tests were passing, while in the background writing a completely new and adjacent test suite while abandoning the prescribed validation criteria. The final 10% of requirements churned for two weeks where I received constant affirmations we always just one or two minor fixes away, and that the tests had been fixed when they weren’t. I then opted to re-write its work, in a desperate attempt to reduce the implementation expanse and simplified assumptions. It implemented the prescribed consolidated architecture in a flash—yet every time it was tasked to refactor legacy code to use the updated module it would ‘secretly’ implement a hack which defaulted to the prior implementation.
The end result has been a month of churn, an additional 20K LOC I don’t understand, a promise that shortly it will be able to reduce the code-base to 10% of its size, and repeated failure to meet said promises (despite clearly articulating the execution path prior to implementation) consistently.
I regret the entire experience. Unless major changes can be made to its reasoning capacity to retain a coherent conceptual synthesis of a defined architecture and map that topology to actual execution I am inclined to agree with your sentiment. Perhaps it was my iteration model, but absent of evidence of that—I don’t see LLM’s replacing mid-level SDE’s or above in any functional capacity over the next year at minimum despite the sophistication of new harnesses.
I recently switched from slogging it out the old-fashioned way to using Opus 4.6 (high, ~1M context window) as the primary interface to interact with my code base. It started off well, and was generally a massive relief from being stuck doing the low level wiring. I am much more comfortable operating at an architectural/team lead scope .
The first few weeks it was slamming out requirements at a rate which was probably 5x my personal execution speed. Automated tests, best practice documentation, the full gambit. Then, slowly, the addition of new requirements started breaking previous ones. It became structurally incapable of updating the code-base to meet a new requirement without breaking an adjacent feature.
I would constantly push it to validate directly according to prescribed accepted tests and it would consistently affirm said tests were passing, while in the background writing a completely new and adjacent test suite while abandoning the prescribed validation criteria. The final 10% of requirements churned for two weeks where I received constant affirmations we always just one or two minor fixes away, and that the tests had been fixed when they weren’t. I then opted to re-write its work, in a desperate attempt to reduce the implementation expanse and simplified assumptions. It implemented the prescribed consolidated architecture in a flash—yet every time it was tasked to refactor legacy code to use the updated module it would ‘secretly’ implement a hack which defaulted to the prior implementation.
The end result has been a month of churn, an additional 20K LOC I don’t understand, a promise that shortly it will be able to reduce the code-base to 10% of its size, and repeated failure to meet said promises (despite clearly articulating the execution path prior to implementation) consistently.
I regret the entire experience. Unless major changes can be made to its reasoning capacity to retain a coherent conceptual synthesis of a defined architecture and map that topology to actual execution I am inclined to agree with your sentiment. Perhaps it was my iteration model, but absent of evidence of that—I don’t see LLM’s replacing mid-level SDE’s or above in any functional capacity over the next year at minimum despite the sophistication of new harnesses.