Claude Opus 4.6 had a time of 14.5 hours on the METR graph of capabilities, showing that things are escalating faster than we expected on that front as well.
I think people are updating too much based on a measurement that even METR staff explicitly called noisy.
EDIT: I noticed that later in the post you did mention that it’s noisy.
I think people are updating too much based on a measurement that even METR staff explicitly called noisy.
EDIT: I noticed that later in the post you did mention that it’s noisy.