It already does not seem to hold up very well. I underestimated how far reasoning models based on CoT could scale. Generally, I’m now more convinced that labs can consistently convert money into capabilities by training transformers. In particular, I was previously very skeptical that RL would “just happen” to be solved with the same techniques as sequence prediction / text generation. It does seem that sequential decision making (eg Pokémon) continues to be the biggest weakness (by far relative to other capabilities), and agents are not very useful outside of coding, but this may be a transient issue with perception or just training focus (though I still think it’s not clear, I would bet that way).
I think progress is very hard to predict and trying is not very constructive (at least for me). In particular, I’ve updated that my area of expertise (eg AIXI) may not be very helpful for making such predictions, as compared to more direct experience with models. Also, I think perhaps my reasoning has been corrupted by optimism which prevented me from viewing the labs as competent adversaries (equipped to overcome many of the obstacles I hoped would stop them).
There still have not been very convincing original insights from LLMs, and it’s possible that I’m updating too far and early-2025 me will turn out to be more right than late-2025 me. One problem is that there really is a great deal of hype / advertising around LLMs, and incentives to blow up the significance of their accomplishments so it’s hard to say exactly how good they are at things like math except through first-hand experience (until they start one-shotting seriously hard problems). But even from first-hand experience alone, IF I HAD TO BET, I’d say the writing is on the wall.
AI has really become the new polarizing issue. One camp thinks it’s the future: “just extrapolate the graphs”, “look at the coding”, “AGI is near”, “the risks are real.” The other camp thinks it’s pure hype: “it’s a bubble”, “you’re just saying that to make money”, “plagiarizing slop machine”, “the risks are science fiction”. It is literally impossible to tell what’s going on with AI based on the wisdom of the crowd.
I watched a video that tried to explain what AI can’t yet do. It was extremely bland and featured areas like “common sense” (I thought “common sense questions” were one of the main things LLMs had solved!). Not a single specific task it can’t yet do, because nobody can tell right now. I lost track of AI capabilities after the o3 rollout.
It has been less than one year since I posted my model of what is going on with LLMs: https://www.lesswrong.com/posts/vvgND6aLjuDR6QzDF/my-model-of-what-is-going-on-with-llms
It already does not seem to hold up very well. I underestimated how far reasoning models based on CoT could scale. Generally, I’m now more convinced that labs can consistently convert money into capabilities by training transformers. In particular, I was previously very skeptical that RL would “just happen” to be solved with the same techniques as sequence prediction / text generation. It does seem that sequential decision making (eg Pokémon) continues to be the biggest weakness (by far relative to other capabilities), and agents are not very useful outside of coding, but this may be a transient issue with perception or just training focus (though I still think it’s not clear, I would bet that way).
I think progress is very hard to predict and trying is not very constructive (at least for me). In particular, I’ve updated that my area of expertise (eg AIXI) may not be very helpful for making such predictions, as compared to more direct experience with models. Also, I think perhaps my reasoning has been corrupted by optimism which prevented me from viewing the labs as competent adversaries (equipped to overcome many of the obstacles I hoped would stop them).
There still have not been very convincing original insights from LLMs, and it’s possible that I’m updating too far and early-2025 me will turn out to be more right than late-2025 me. One problem is that there really is a great deal of hype / advertising around LLMs, and incentives to blow up the significance of their accomplishments so it’s hard to say exactly how good they are at things like math except through first-hand experience (until they start one-shotting seriously hard problems). But even from first-hand experience alone, IF I HAD TO BET, I’d say the writing is on the wall.
AI has really become the new polarizing issue. One camp thinks it’s the future: “just extrapolate the graphs”, “look at the coding”, “AGI is near”, “the risks are real.” The other camp thinks it’s pure hype: “it’s a bubble”, “you’re just saying that to make money”, “plagiarizing slop machine”, “the risks are science fiction”. It is literally impossible to tell what’s going on with AI based on the wisdom of the crowd.
I watched a video that tried to explain what AI can’t yet do. It was extremely bland and featured areas like “common sense” (I thought “common sense questions” were one of the main things LLMs had solved!). Not a single specific task it can’t yet do, because nobody can tell right now. I lost track of AI capabilities after the o3 rollout.