shash42 comments on How to game the METR plot

shash42 21 Dec 2025 16:54 UTC
2 points
0
This is cool! I think I’m updating toward the logistic fit not mattering. The question I have now is: what would it have taken on this underlying data for the log-linear trend not to hold. My guess is models not making progress for months, and staying at similar aggregate accuracy (with success rates staying roughly inversely correlated with task length).