Except that the METR researchers also had their share of issues with SWE-bench verified (see, e.g. page 38 of their article). By the time when o1 was released (is it even possible to have it reason without the CoT, like in your evaluation? And what about GPT-4.5?), the METR horizon was shorter than that of the SWE bench. Were said horizon to depend on the METR time horizon log-linearly, the 40hr time horizon on SWE would likely mean a far shorter METR time horizon[1], which is likely absurd.
Additionally, I wonder if sabotage of alignment R&D could be easier than we think. Suppose, for example, that Max Harms’ CAST formalism is transformed into a formalism leading to an entirely different future via no-CoT-reasoning equivalent to far less than 16 minutes of human reasoning. Is it plausible?
For example, between GPT-4o and o1 the SWE horizon improved ~8.5 times and the METR horizon improved ~4.5 times. Were the SWE horizon to improve 40 more times, its log-linear fit would mean that the METR horizon increased by ~8-16 times, to just 4-9 hrs. Could you share the SWE-bench-verified horizons that you used for the 5-month doubling time?
Except that the METR researchers also had their share of issues with SWE-bench verified (see, e.g. page 38 of their article). By the time when o1 was released (is it even possible to have it reason without the CoT, like in your evaluation? And what about GPT-4.5?), the METR horizon was shorter than that of the SWE bench. Were said horizon to depend on the METR time horizon log-linearly, the 40hr time horizon on SWE would likely mean a far shorter METR time horizon[1], which is likely absurd.
Additionally, I wonder if sabotage of alignment R&D could be easier than we think. Suppose, for example, that Max Harms’ CAST formalism is transformed into a formalism leading to an entirely different future via no-CoT-reasoning equivalent to far less than 16 minutes of human reasoning. Is it plausible?
For example, between GPT-4o and o1 the SWE horizon improved ~8.5 times and the METR horizon improved ~4.5 times. Were the SWE horizon to improve 40 more times, its log-linear fit would mean that the METR horizon increased by ~8-16 times, to just 4-9 hrs. Could you share the SWE-bench-verified horizons that you used for the 5-month doubling time?