It might still be impressive, but models are largely remixing many things it has seen in great detail during training (many impressive headline results have even been determined to be the model re-using existing implementations/PRs via search instead of coming up with actually-new ones!). This is not about LLMs not doing impressive things! This is about precisely describing their capability profile, where it comes from, and whether more of the same (e.g., scale) gets you a whole new set of impressive outcomes (e.g., novel R&D that isn’t just remixing existing research).
Do you think that today’s breakthrough on the planar unit distance problem is merely the model remixing things learned during pretraining? I’m not an expert, but it seems unlikely to me. Arul Shankar, a notable number theorist, stated:
In my opinion this paper demonstrates that current AI models go beyond just helpers to human mathematicians – they are capable of having original ingenious ideas, and then carrying them out to fruition.
In anticipation of people using that example as a response, I added a follow-up tweet to the tweet version of this post:
yes, dear reader, the new erdos unit distance problem solved by OpenAI’s model still fits within this story. in fact, i expected these kinds of results.
My understanding is that the main difficulty in arriving at this result is performing sufficient inference over the entire search space, far beyond what any human can do. And so, it is indeed superhuman in that specific capability.
But I suspect that every step to get there was within distribution, and ICL was enough to arrive at the result. However, I also suspect that there are many other types of problems that require integrating knowledge into the weights and can’t be done with long chains of “within-distribution” reasoning that sum to a new result.
So, I suspect the bottleneck in solving the problem in the past was more about human inability to search through an extremely large number of paths.
This is not to say what the model did isn’t impressive, but imo this is within the realm of problems I’d expect it to solve (which is a lot!). There are other types of problems I expect they would need to consolidate information (in the weights) while solving the problem. Though who is to say they aren’t doing that, I don’t know what OAI is doing for sure.
FYI, I’m running on the assumption that proofs will be cheap/free in the coming years (for most applied math, likely sooner) and partially betting my startup on this, which is another reason I expected this kind of result.
Do you think that today’s breakthrough on the planar unit distance problem is merely the model remixing things learned during pretraining? I’m not an expert, but it seems unlikely to me. Arul Shankar, a notable number theorist, stated:
And I think this much is clear by looking over the proof and supplemental materials.
In anticipation of people using that example as a response, I added a follow-up tweet to the tweet version of this post:
My understanding is that the main difficulty in arriving at this result is performing sufficient inference over the entire search space, far beyond what any human can do. And so, it is indeed superhuman in that specific capability.
But I suspect that every step to get there was within distribution, and ICL was enough to arrive at the result. However, I also suspect that there are many other types of problems that require integrating knowledge into the weights and can’t be done with long chains of “within-distribution” reasoning that sum to a new result.
So, I suspect the bottleneck in solving the problem in the past was more about human inability to search through an extremely large number of paths.
This is not to say what the model did isn’t impressive, but imo this is within the realm of problems I’d expect it to solve (which is a lot!). There are other types of problems I expect they would need to consolidate information (in the weights) while solving the problem. Though who is to say they aren’t doing that, I don’t know what OAI is doing for sure.
FYI, I’m running on the assumption that proofs will be cheap/free in the coming years (for most applied math, likely sooner) and partially betting my startup on this, which is another reason I expected this kind of result.