Here’s an argument for a capabilities plateau at the level of GPT-4 that I haven’t seen discussed before. I’m interested in any holes anyone can spot in it.
One obvious hole would be that capabilities did not, in fact, plateau at the level of GPT-4.
I thought the argument was that progress has slowed down immensely. The softer form of this argument is that LLMs won’t plateau but progress will slow to such a crawl that other methods will surpass them. The arrival of o1 and o3 says this has already happened, at least in limited domains—and hybrid training methods and perhaps hybrid systems probably will proceed to surpass base LLMs in all domains.
There’s been incremental improvement and various quality-of-life features like more pleasant chatbot personas, tool use, multimodality, gradually better math/programming performance that make the models useful for gradually bigger demographics, et cetera.
But it’s all incremental, no jumps like 2-to-3 or 3-to-4.
I see, thanks. Just to make sure I’m understanding you correctly, are you excluding the reasoning models, or are you saying there was no jump from GPT-4 to o3? (At first I thought you were excluding them in this comment, until I noticed the “gradually better math/programming performance.”)
One obvious hole would be that capabilities did not, in fact, plateau at the level of GPT-4.
I thought the argument was that progress has slowed down immensely. The softer form of this argument is that LLMs won’t plateau but progress will slow to such a crawl that other methods will surpass them. The arrival of o1 and o3 says this has already happened, at least in limited domains—and hybrid training methods and perhaps hybrid systems probably will proceed to surpass base LLMs in all domains.
There’s been incremental improvement and various quality-of-life features like more pleasant chatbot personas, tool use, multimodality, gradually better math/programming performance that make the models useful for gradually bigger demographics, et cetera.
But it’s all incremental, no jumps like 2-to-3 or 3-to-4.
I see, thanks. Just to make sure I’m understanding you correctly, are you excluding the reasoning models, or are you saying there was no jump from GPT-4 to o3? (At first I thought you were excluding them in this comment, until I noticed the “gradually better math/programming performance.”)
I think GPT-4 to o3 represent non-incremental narrow progress, but only, at best, incremental general progress.
(It’s possible that o3 does “unlock” transfer learning, or that o4 will do that, etc., but we’ve seen no indication of that so far.)