If you allow the hacks you get 13 hours, versus 12 hours for Claude Opus 4.6.
That is, if you count a hack as a success, which would be stupid. Note that for all previous models, they did not count hacks as successes. It’s unclear whether Opus 4.6 ever hacked but if it did then its score would be higher than 12 hours too.
That is, if you count a hack as a success, which would be stupid. Note that for all previous models, they did not count hacks as successes. It’s unclear whether Opus 4.6 ever hacked but if it did then its score would be higher than 12 hours too.