There’s a lot of nuance to expand on, and I’ll maybe write a proper update at some point, but overall...
Yeah, I pretty much stand by it.
Most of new gains are from RL, not from pretraining. RL so far seems to be brittle, the frontier remains jagged. LLMs are getting more useful for programming and math, but in narrowly scoped ways, and they break hard off-distribution (e. g. this). They keep getting better in ways they’ve been getting better, but there are no qualitative improvements. Vibes-wise, they still have no “spark” needed for novel problem-solving, and every supposed result showing otherwise turns out to be bullshit (e. g., GPT-5.x’s recent physics breakthrough). Vibes-wise, there’s still nothing really “there”, no “true agency”, just increasingly better approximations of something-being-there which don’t seem to be on track to metamorphose into the real thing.
See this, this (with one, two, three lead-ups), and this as my more recent write-ups informing the above perspective.
The world at large seems to be getting ever more freaked out, but as far as I can tell, it’s because the LLM capabilities that I more or less priced in all the way back at GPT-3.5 are becoming viscerally legible to ever more people. Which is nice as far as raising the profile of the AI Notkilleveryoneism cause goes, but none of this provoked much of an update from me.
Ok yea I think you are wildly undervaluing agentic harnesses, context engineering, etc.
For a fixed model and amount of inference we now have significantly better infra built around it and for me at least (and most other people I know who use these tools) they make a huge difference, even as the chatbot itself has only become slightly better in general common sense and much better at narrow domains.
Curious if you still stand by this? most of it seems directionally wrong at this point to me but not confident
There’s a lot of nuance to expand on, and I’ll maybe write a proper update at some point, but overall...
Yeah, I pretty much stand by it.
Most of new gains are from RL, not from pretraining. RL so far seems to be brittle, the frontier remains jagged. LLMs are getting more useful for programming and math, but in narrowly scoped ways, and they break hard off-distribution (e. g. this). They keep getting better in ways they’ve been getting better, but there are no qualitative improvements. Vibes-wise, they still have no “spark” needed for novel problem-solving, and every supposed result showing otherwise turns out to be bullshit (e. g., GPT-5.x’s recent physics breakthrough). Vibes-wise, there’s still nothing really “there”, no “true agency”, just increasingly better approximations of something-being-there which don’t seem to be on track to metamorphose into the real thing.
See this, this (with one, two, three lead-ups), and this as my more recent write-ups informing the above perspective.
The world at large seems to be getting ever more freaked out, but as far as I can tell, it’s because the LLM capabilities that I more or less priced in all the way back at GPT-3.5 are becoming viscerally legible to ever more people. Which is nice as far as raising the profile of the AI Notkilleveryoneism cause goes, but none of this provoked much of an update from me.
Ok yea I think you are wildly undervaluing agentic harnesses, context engineering, etc.
For a fixed model and amount of inference we now have significantly better infra built around it and for me at least (and most other people I know who use these tools) they make a huge difference, even as the chatbot itself has only become slightly better in general common sense and much better at narrow domains.