Yep, you did explicitly state that you expect LLMs to keep getting better along some dimensions; however, the quote I was responding to seemed too extreme in isolation. I agree that the vibe-bias is a thing (I’m manipulable to “sounding smart” too); I guess part of what I wanted to get across is that it really depends how you’re testing these things. If you have a serious use-case involving real cognitive labor & you keep going back to that when a new model is released, you’ll be much harder to fool by vibes.
Now, granted, in the limit of infinitely precise conceptual resolution, LLMs would develop the abilities to autonomously act and innovate. But what seems to be happening is that performance on some tasks (“chat with a PDF”) scales with conceptual resolution much better than the performance on other tasks (“prove this theorem”, “pick the next research direction”), and conceptual resolution isn’t improving as fast as it once was (as e. g. in the GPT-3 to GPT-4 jump).
& notably, it’s extremely slow to improve on some tasks (eg multiplication of numbers, even though multiplication is quadratic in number of tokens & transformers are also quadratic in number of tokens).
I somewhat think that conceptual resolution is still increasing about as quickly; it’s just that there are rapidly diminishing returns to conceptual resolution, because the distribution of tasks spans many orders of magnitude in conceptual-resolution-space. LLMs have adequate conceptual resolution for a lot of tasks, now, so even if conceptual resolution doubles, this just doesn’t “pop” like it did before.
(Meanwhile, my personal task-distribution has a conceptual resolution closer to the current frontier, so I am feeling very rapid improvement at the moment.)
Humans have almost-arbitrary conceptual resolution when needed (EG we can accurately multiply very long numbers if we need to), so many of the remaining tasks not conquered by current LLMs (EG professional-level math research) probably involve much higher conceptual resolution.
Yep, you did explicitly state that you expect LLMs to keep getting better along some dimensions; however, the quote I was responding to seemed too extreme in isolation. I agree that the vibe-bias is a thing (I’m manipulable to “sounding smart” too); I guess part of what I wanted to get across is that it really depends how you’re testing these things. If you have a serious use-case involving real cognitive labor & you keep going back to that when a new model is released, you’ll be much harder to fool by vibes.
& notably, it’s extremely slow to improve on some tasks (eg multiplication of numbers, even though multiplication is quadratic in number of tokens & transformers are also quadratic in number of tokens).
I somewhat think that conceptual resolution is still increasing about as quickly; it’s just that there are rapidly diminishing returns to conceptual resolution, because the distribution of tasks spans many orders of magnitude in conceptual-resolution-space. LLMs have adequate conceptual resolution for a lot of tasks, now, so even if conceptual resolution doubles, this just doesn’t “pop” like it did before.
(Meanwhile, my personal task-distribution has a conceptual resolution closer to the current frontier, so I am feeling very rapid improvement at the moment.)
Humans have almost-arbitrary conceptual resolution when needed (EG we can accurately multiply very long numbers if we need to), so many of the remaining tasks not conquered by current LLMs (EG professional-level math research) probably involve much higher conceptual resolution.