I would be interested to see this experiment replicated with a different model, like Claude 4.1 Opus or GPT-5. Claude 3.7 Sonnet is perhaps the most notorious LLM in terms of ignoring user intent and propensity to reward hack when writing code.
I would be interested to see this experiment replicated with a different model, like Claude 4.1 Opus or GPT-5. Claude 3.7 Sonnet is perhaps the most notorious LLM in terms of ignoring user intent and propensity to reward hack when writing code.