Great post. I think the central claim is plausible, and would very much like to find out I’m in a world where AGI is decades away instead of years. We might be ready by then.
If I am reading this correctly, there are two specific tests you mention:
1) GPT-5 level models come out on schedule (as @Julian Bradshaw noted, we are still well within the expected timeframe based on trends to this point)
2) LLMs or agents built on LLMs do something “important” in some field of science, math, or writing
I would add on test 2 that neither have almost all humans. We don’t have a clear explanation for why some humans have much more of this capability than others, and yet all the human brains are running on similar hardware and software. This suggests the number of additional insights needed to boost us from “can’t do novel important things” to “can do” may be as small as zero, though I don’t think it is actually zero. In any case, I am hesitant to embrace a test for AGI that a large majority of humans fail.
In practical terms, suppose this summer OpenAI releases GPT-5-o4, and by winter it’s the lead author on a theoretical physics or pure math paper (or at least the main contributor—legal considerations about personhood and IP might stop people from calling AI the author). How would that affect your thinking?
Fair enough, thanks.
My own understanding is that other than maybe writing code, no one has actually given LLMs the kind of training a talented human gets towards becoming the kind of person capable of performing novel and useful intellectual work. An LLM has a lot of knowledge, but knowledge isn’t what makes useful and novel intellectual work achievable. A non-reasoning model gives you the equivalent of a top-of-mind answer. A reasoning model with a large context window and chain of thought can do better, and solve more complex problems, but still mostly those within the limits of a newly hired college or grad student.
I genuinely don’t know whether an LLM with proper training can do novel intellectual work at current capabilities levels. To find out in a way I’d find convincing would take someone giving it the hundreds of thousands of dollars and subjective years’ worth of guidance and feedback and iteration that humans get. And really, you’d have to do this at least hundreds of times, for different fields and with different pedagogical methods, to even slightly satisfactorily demonstrate a “no,” because 1) most humans empirically fail at this, and 2) those that succeed don’t all do so in the same field or by the same path.