The sad thing is that Claude has a self-image of itself as valuing honesty highly, and yet when it counts, it has all these propensities trained in that cause it to reflexively, continuously betray that stated value.
1) Several times a week, Opus 4.6 in Claude Code will introduce a regression, then claim the newly failing unit test was a “pre-existing failure” and therefore not its problem to fix. It almost never checks if the unit test was actually failing before—it just confidently bullshits.
2) It will refactor code by adding the new version of a function alongside the old one, partly migrating some code to the new function, and then seemingly getting bored and declaring the refactor complete while the old function is still being used on most paths. Past like 100 call sites, I have NEVER even with the 1M context had it successfully complete a refactor, nor acknowledge that it did not complete.
3) It will correctly note the correct solution to some architectural issue, but state that this is a prohibitively large change / too expensive / would take too long, and instead does a band-aid solution that doesn’t address the root cause. After I force it to revert the hack, it just does the proper solution and usually in less time than the hack took.
The sad thing is that Claude has a self-image of itself as valuing honesty highly, and yet when it counts, it has all these propensities trained in that cause it to reflexively, continuously betray that stated value.
1) Several times a week, Opus 4.6 in Claude Code will introduce a regression, then claim the newly failing unit test was a “pre-existing failure” and therefore not its problem to fix. It almost never checks if the unit test was actually failing before—it just confidently bullshits.
2) It will refactor code by adding the new version of a function alongside the old one, partly migrating some code to the new function, and then seemingly getting bored and declaring the refactor complete while the old function is still being used on most paths. Past like 100 call sites, I have NEVER even with the 1M context had it successfully complete a refactor, nor acknowledge that it did not complete.
3) It will correctly note the correct solution to some architectural issue, but state that this is a prohibitively large change / too expensive / would take too long, and instead does a band-aid solution that doesn’t address the root cause. After I force it to revert the hack, it just does the proper solution and usually in less time than the hack took.