I do believe that AIs will eventually surpass humans at fluid intelligence, though I’m highly uncertain as to the timeline.
My point here is really just the oft-repeated observation that when we see an AI do X, intuitively we tend to assess the AI the way we would assess a human being who could do X, and that intuition can lead to very poor estimates of whether the AI can also do Y. (For instance, bar exam → practicing law.) For instance, the relative ratios of fluid vs. crystal intelligence may capture much of the reason that AIs are approaching superhuman status at competition coding problems but are still far from superhuman at many real-world coding tasks. It doesn’t mean AIs will never get to real-world tasks. It just suggests (to me) that they might be farther from that milestone than their performance on crystal-intelligence-friendly tasks would imply.
It just suggests (to me) that they might be farther from that milestone than their performance on crystal-intelligence-friendly tasks would imply.
I basically agree, but we can more directly attempt extrapolations (e.g. METR horizon length) and I put more weight on this.
I also find it a bit silly when people say “AIs are very good at competition programming, so surely they must soon be able to automate SWE” (a thing I have seen at least some semi-prominent frontier AI company employees imply). That said, I think AIs being good at competitive programming is substantially not based on better cystalized intelligence and is instead based on this being easier to train for with RL and easier to scale up inference compute on.
I do believe that AIs will eventually surpass humans at fluid intelligence, though I’m highly uncertain as to the timeline.
My point here is really just the oft-repeated observation that when we see an AI do X, intuitively we tend to assess the AI the way we would assess a human being who could do X, and that intuition can lead to very poor estimates of whether the AI can also do Y. (For instance, bar exam → practicing law.) For instance, the relative ratios of fluid vs. crystal intelligence may capture much of the reason that AIs are approaching superhuman status at competition coding problems but are still far from superhuman at many real-world coding tasks. It doesn’t mean AIs will never get to real-world tasks. It just suggests (to me) that they might be farther from that milestone than their performance on crystal-intelligence-friendly tasks would imply.
I basically agree, but we can more directly attempt extrapolations (e.g. METR horizon length) and I put more weight on this.
I also find it a bit silly when people say “AIs are very good at competition programming, so surely they must soon be able to automate SWE” (a thing I have seen at least some semi-prominent frontier AI company employees imply). That said, I think AIs being good at competitive programming is substantially not based on better cystalized intelligence and is instead based on this being easier to train for with RL and easier to scale up inference compute on.