LLMs becoming actually useful for hypothesis generation in my agent-foundations research.
A measurable “vibe shift” where competent people start doing what LLMs tell them to (regarding business ideas, research directions, etc.), rather than the other way around.
o4 zero-shotting games like Pokémon without having been trained to do that.
One of the models scoring well on the Millennium Prize Benchmark.
AI agents able to spin up a massive codebase solving a novel problem without human handholding / software engineering becoming “solved” / all competent programmers switching to “vibe coding”.
Reasoning models’ skills starting to generalize in harder-to-make-legible ways that look scary to me.
These are bad examples! Your observations are basically “At the point where LLM’s are AGI. I will change my mind”
If it solves pokemon one-shot, solves coding or human beings are superfluous for decision making in general. I would call that AGI, and if it can code by itself it’s already taking off!
All you have shown me now is that you can’t think of any intermediate steps LLM’s still have to go through before they reach AGI.
Valid complaint, honestly. I wasn’t really going for “good observables to watch out for” there, though, just for making the point that my current model is at all falsifiable (which is I think what @Jman9107 was mostly angling for, no?).
The type of evidence I expect to actually end up updating on, in real life, if we are in the LLMs-are-AGI-complete timeline, is this one:
Reasoning models’ skills starting to generalize in harder-to-make-legible ways that look scary to me.
Some sort of subtle observable or argument that’s currently an unknown unknown to me, which will make me think about it a bit and realize it upends my whole model.
My biggest question as always is “what specific piece of evidence would make you change your mind”
Off the top of my head:
LLMs becoming actually useful for hypothesis generation in my agent-foundations research.
A measurable “vibe shift” where competent people start doing what LLMs tell them to (regarding business ideas, research directions, etc.), rather than the other way around.
o4 zero-shotting games like Pokémon without having been trained to do that.
One of the models scoring well on the Millennium Prize Benchmark.
AI agents able to spin up a massive codebase solving a novel problem without human handholding / software engineering becoming “solved” / all competent programmers switching to “vibe coding”.
Reasoning models’ skills starting to generalize in harder-to-make-legible ways that look scary to me.
These are bad examples! Your observations are basically “At the point where LLM’s are AGI. I will change my mind”
If it solves pokemon one-shot, solves coding or human beings are superfluous for decision making in general. I would call that AGI, and if it can code by itself it’s already taking off!
All you have shown me now is that you can’t think of any intermediate steps LLM’s still have to go through before they reach AGI.
Valid complaint, honestly. I wasn’t really going for “good observables to watch out for” there, though, just for making the point that my current model is at all falsifiable (which is I think what @Jman9107 was mostly angling for, no?).
The type of evidence I expect to actually end up updating on, in real life, if we are in the LLMs-are-AGI-complete timeline, is this one:
Some sort of subtle observable or argument that’s currently an unknown unknown to me, which will make me think about it a bit and realize it upends my whole model.