It just suggests (to me) that they might be farther from that milestone than their performance on crystal-intelligence-friendly tasks would imply.
I basically agree, but we can more directly attempt extrapolations (e.g. METR horizon length) and I put more weight on this.
I also find it a bit silly when people say “AIs are very good at competition programming, so surely they must soon be able to automate SWE” (a thing I have seen at least some semi-prominent frontier AI company employees imply). That said, I think AIs being good at competitive programming is substantially not based on better cystalized intelligence and is instead based on this being easier to train for with RL and easier to scale up inference compute on.
I basically agree, but we can more directly attempt extrapolations (e.g. METR horizon length) and I put more weight on this.
I also find it a bit silly when people say “AIs are very good at competition programming, so surely they must soon be able to automate SWE” (a thing I have seen at least some semi-prominent frontier AI company employees imply). That said, I think AIs being good at competitive programming is substantially not based on better cystalized intelligence and is instead based on this being easier to train for with RL and easier to scale up inference compute on.