An important point is that I frame this as creating insights at least semi-reliably, rather than being able to create any insights at all, because LLMs have already created insights, so it can’t be due to a fundamental incapacity, but rather a practical incapability.
To be clear, practical incapability can be nearly as bad/good as fundamental incapability, so this nitpick doesn’t matter too much.
I consider these examples to be pretty weak. I was arguing that LLMs had not made novel insights (which inspired Abram to create the question), and I’m still surprised no has mentioned anything more convincing months later. It seems like just stumbling around blindly in idea-space would lead to more original ideas.
The math result is just filling in a step that the researchers could have done themselves and doesn’t seem to go beyond standard techniques. Considering the IMO results one would expect better than this by now (I guess we’ll see soon).
The chemistry one… even if it really didn’t appear anywhere on the internet, maybe LLMs just suggest a lot of ideas and we only hear about the ones that worked. If I heard A LOT of examples like this, it would be convincing. But one just seems like getting lucky.
The key point is that I only need one example, so it doesn’t matter whether or not the LLM simply got it lucky.
However, I do agree that if I assume that the LLM got lucky, this would be very bad in a practical sense, as this would make such insights take a long time to repeat by pure chance.
A key worldview here is that I think it’s rarely productive to debate whether or not AI has a fundamentally incapable at a certain task, because if we allow arbitrary resources, and especially arbitrary designs subject only to the loosest requirements of existence, we can solve any problem with AI/create ASI trivially, and the actual question is whether a certain approach can actually do the task in question with limits on resources.
Another way to say it is that you should always focus on the quantitative questions over the qualitative questions.
In general, overfocusing on fundamental barriers that remain even with infinite compute and not focusing on the finite compute case is one of the biggest barriers to AI progress/good AI discourse over the years (though admittedly for theoretical analysis like computational complexity part of the issue is proving stuff about computation is quite hard, and a lot of the issue is we believe that cryptography exists combined with the fact that we are quite bad at opening up the black-box of a computer, which is necessary in order to solve a lot of problems in computational complexity)
I’ll quote from a recent comment here:
My take on how recursion theory failed to be relevant for today’s AI is that it turned out that what a machine could do if unconstrained basically didn’t matter at all, and in particular it basically didn’t matter what limits an ideal machine could do, because once we actually impose constraints that force computation to use very limited amounts of resources, we get a non-trivial theory and importantly all of the difficulty of explaining how humans do stuff lies here.
There was too much focus on “could a machine do something at all?” and not enough focus on “what could a machine with severe limitations could do?”
The reason is that in a sense, it is trivial to solve any problem with a machine if I’m allowed zero constraints except that the machine has to exist in a mathematical sense.
A good example of this is the paper on A Universal Hypercomputer, which shows how absurdly powerful computation can be if you are truly unconstrained:
Another smaller point is I tend to have a somewhat more continuous model of how much creativity AIs currently have, and while there’s a point to be made that sufficiently terrible/bad creativity is in practice not different from having 0 creativity, and it might turn out to be too inefficient to scale creativity because in-context learning scales poorly due to context windows not being a sufficient replacement for a long-term memory/continual thinking just being way more efficient than pure mesa-optimization, but I do claim that literally 0 creativity or insights is not very likely based on priors.
Really, this could be fixed by simply switching the question from “Are LLMs creative/have insights at all?” to “Is there any way to get an LLM to autonomously generate and recognize genuine scientific insights at least at the same rate as human scientists?”.
And I’d easily agree that so far, LLMs have not actually done this, especially if we require it to be as good as the best human scientists.
To be clear, I’m not smuggling in any claim that would imply that since LLMs can do a non-zero amount of insights, that this reliably scales with the current architecture into enough insights to replace AI researchers and start AI takeoff by say 2030.
I consider these examples to be pretty weak. I was arguing that LLMs had not made novel insights (which inspired Abram to create the question), and I’m still surprised no has mentioned anything more convincing months later. It seems like just stumbling around blindly in idea-space would lead to more original ideas.
The math result is just filling in a step that the researchers could have done themselves and doesn’t seem to go beyond standard techniques. Considering the IMO results one would expect better than this by now (I guess we’ll see soon).
The chemistry one… even if it really didn’t appear anywhere on the internet, maybe LLMs just suggest a lot of ideas and we only hear about the ones that worked. If I heard A LOT of examples like this, it would be convincing. But one just seems like getting lucky.
The key point is that I only need one example, so it doesn’t matter whether or not the LLM simply got it lucky.
However, I do agree that if I assume that the LLM got lucky, this would be very bad in a practical sense, as this would make such insights take a long time to repeat by pure chance.
A key worldview here is that I think it’s rarely productive to debate whether or not AI has a fundamentally incapable at a certain task, because if we allow arbitrary resources, and especially arbitrary designs subject only to the loosest requirements of existence, we can solve any problem with AI/create ASI trivially, and the actual question is whether a certain approach can actually do the task in question with limits on resources.
Another way to say it is that you should always focus on the quantitative questions over the qualitative questions.
In general, overfocusing on fundamental barriers that remain even with infinite compute and not focusing on the finite compute case is one of the biggest barriers to AI progress/good AI discourse over the years (though admittedly for theoretical analysis like computational complexity part of the issue is proving stuff about computation is quite hard, and a lot of the issue is we believe that cryptography exists combined with the fact that we are quite bad at opening up the black-box of a computer, which is necessary in order to solve a lot of problems in computational complexity)
I’ll quote from a recent comment here:
Another smaller point is I tend to have a somewhat more continuous model of how much creativity AIs currently have, and while there’s a point to be made that sufficiently terrible/bad creativity is in practice not different from having 0 creativity, and it might turn out to be too inefficient to scale creativity because in-context learning scales poorly due to context windows not being a sufficient replacement for a long-term memory/continual thinking just being way more efficient than pure mesa-optimization, but I do claim that literally 0 creativity or insights is not very likely based on priors.
Really, this could be fixed by simply switching the question from “Are LLMs creative/have insights at all?” to “Is there any way to get an LLM to autonomously generate and recognize genuine scientific insights at least at the same rate as human scientists?”.
And I’d easily agree that so far, LLMs have not actually done this, especially if we require it to be as good as the best human scientists.
To be clear, I’m not smuggling in any claim that would imply that since LLMs can do a non-zero amount of insights, that this reliably scales with the current architecture into enough insights to replace AI researchers and start AI takeoff by say 2030.