If claude sonnet 4 in claude code has access to edit a test, and is asked to investigate and fix why the test is failing, then its default action is to modify the test to pass unconditionally (while still looking like it tests the code) and then lie about it, telling a long story about how it cleverly found and solved an underlying logic error.
he presumably means “the AIs reads the unit test then rewrite the tested code so it overfits on the test, e.g. by using the magic numbers in the unit test.”
he might alternatively mean “the AI changes the unit test to be less strict” but this would be easy to fix with permission access.
Can you clarify what you mean by “break the unit test so that it falsely passed”?
If claude sonnet 4 in claude code has access to edit a test, and is asked to investigate and fix why the test is failing, then its default action is to modify the test to pass unconditionally (while still looking like it tests the code) and then lie about it, telling a long story about how it cleverly found and solved an underlying logic error.
he presumably means “the AIs reads the unit test then rewrite the tested code so it overfits on the test, e.g. by using the magic numbers in the unit test.”
he might alternatively mean “the AI changes the unit test to be less strict” but this would be easy to fix with permission access.