Not speaking for anyone else at METR, but I personally think it’s inherently difficult to raise the salience of something like time horizon during a period of massive hype without creating some degree of hype about the benchmark, and the overall project impact is still highly net positive.
Basically, companies already believe in and explicitly aim for recursive self-improvement, but the public doesn’t. Therefore, we want to tell the public what labs already believe—that RSI could be technically feasible within a few years, that current AIs can do things that take humans a couple of hours under favorable conditions, and that there’s a somewhat consistent trend. We help the public make use of this info to reduce risks, eg by communicating with policymakers and helping companies formulate RSPs, which boosts the ratio of benefit to cost.
You might still think: how large is the cost? Well, the world would look pretty different if investment towards RSI were the primary effect of the time horizon work. Companies would be asking METR how to make models more agentic, enterprise deals would be decided based on time horizon, and we’d see leaked or public roadmaps from companies aiming for 16 hour time horizons by Q2. (Being able to plan for the next node is how Moore’s Law likely sped up semiconductor progress; this is much more difficult to do for time horizon for various reasons.) Also, the amount of misbehavior—especially power-seeking—from more agency has been a bit below my expectations, so it’s unlikely we’ll push things above a near term danger threshold.
If we want to create less pressure towards RSI, it’s not clear what to do. There are some choices we made in the original paper and the current website, like not color-coding the models by company, discussing risks in several sections of the paper, not publishing a leaderboard, keeping many tasks private (though this is largely for benchmark integrity) and adding various caveats and follow-up studies. More drastic options include releasing numbers less frequently, making a worse benchmark, or doing less publicity, and none of these seem appealing in the current environment, although they might become so in the future.
Not speaking for anyone else at METR, but I personally think it’s inherently difficult to raise the salience of something like time horizon during a period of massive hype without creating some degree of hype about the benchmark, and the overall project impact is still highly net positive.
Basically, companies already believe in and explicitly aim for recursive self-improvement, but the public doesn’t. Therefore, we want to tell the public what labs already believe—that RSI could be technically feasible within a few years, that current AIs can do things that take humans a couple of hours under favorable conditions, and that there’s a somewhat consistent trend. We help the public make use of this info to reduce risks, eg by communicating with policymakers and helping companies formulate RSPs, which boosts the ratio of benefit to cost.
You might still think: how large is the cost? Well, the world would look pretty different if investment towards RSI were the primary effect of the time horizon work. Companies would be asking METR how to make models more agentic, enterprise deals would be decided based on time horizon, and we’d see leaked or public roadmaps from companies aiming for 16 hour time horizons by Q2. (Being able to plan for the next node is how Moore’s Law likely sped up semiconductor progress; this is much more difficult to do for time horizon for various reasons.) Also, the amount of misbehavior—especially power-seeking—from more agency has been a bit below my expectations, so it’s unlikely we’ll push things above a near term danger threshold.
If we want to create less pressure towards RSI, it’s not clear what to do. There are some choices we made in the original paper and the current website, like not color-coding the models by company, discussing risks in several sections of the paper, not publishing a leaderboard, keeping many tasks private (though this is largely for benchmark integrity) and adding various caveats and follow-up studies. More drastic options include releasing numbers less frequently, making a worse benchmark, or doing less publicity, and none of these seem appealing in the current environment, although they might become so in the future.
Seems reasonable.