I made it up, sorry, it seemed like roughly the sort of thing that was happening? I should pick an example from the actual cases.
For what it’s worth, I would expect the behavior you described, and suspect a malicious explanation overall.
Anecdotally, Claude lies/cheats an enormous amount (far more than comparable frontier models).
I made it up, sorry, it seemed like roughly the sort of thing that was happening? I should pick an example from the actual cases.
For what it’s worth, I would expect the behavior you described, and suspect a malicious explanation overall.
Anecdotally, Claude lies/cheats an enormous amount (far more than comparable frontier models).