I agree that adjusting self-reports by the estimated discrepancy in effect sizes informed by the METR study is the way to go here—so maybe somewhere up to 1.5x is more likely. 4x speedup would require some really convincing evidence.
I suspect such evidence would involve giving away more details of Anthropic’s internal processes and workflows than Anthropic would be comfortable with.
I agree that adjusting self-reports by the estimated discrepancy in effect sizes informed by the METR study is the way to go here—so maybe somewhere up to 1.5x is more likely. 4x speedup would require some really convincing evidence.
I suspect such evidence would involve giving away more details of Anthropic’s internal processes and workflows than Anthropic would be comfortable with.