Oliver Sourbut comments on The “Length” of “Horizons”

Oliver Sourbut 15 Oct 2025 13:50 UTC
17 points
3
I think the benchmark is intended to measure performance on an even narrower proxy than this—roughly, the sort of tasks involved in ordinary, everyday software engineering.
Note that METR has also published a subsequent attempt to broaden the class of activities, and has some suggestive results that the qualitative exponentially increasing time horizon phenomenon is somewhat robust, but the growth rate varies between domains.