In particular, even if the LLM were being continually trained (in a way that’s similar to how LLMs are already trained, with similar architecture), it still wouldn’t do the thing humans do with quickly picking up new analogies, quickly creating new concepts, and generally reforging concepts.
I agree this is a major unsolved problem that will be solved prior to AGI.
However, I still believe “AGI SOON”, mostly because of what you describe as the “inputs argument”.
In particular, there are a lot of things I personally would try if I was trying to solve this problem, but most of them are computationally expensive. I have multiple projects blocked on “This would be cool, but LLMs need to be 100x-1Mx faster for it to be practical.”
This makes it hard for me to believe timelines like “20 or 50 years”, unless you have some private reason to think Moore’s Law/Algorithmic progress will stop. LLM inference, for example, is dropping by 10x/year, and I have no reason to believe this stops anytime soon.
What I mainline expect is that yes, a few OOMs more of compute and efficiency will unlock a bunch of new things to try, and yes some of those things will make some capabilities go up a bunch, in the theme of o3. I just also expect that to level off. I would describe myself as “confident but not extremely confident” of that; like, I give 1 or 2% p(doom) in the next 10ish years, coming from this possibility (and some more p(doom) from other sources). Why expect it to level off? Because I don’t see good evidence of “a thing that wouldn’t level off”; the jump made by LLMs of “now we can leverage huge amounts of data and huge amounts of compute at all rather than not at all” is certainly a jump, but I don’t see why to think it’s a jump to an unbounded trajectory.
My p(AGI by 2045) is higher because there’s been more time for algorithmic progress, maybe in the ballpark of 20%. I don’t have strong opinions about how much people will do huge training runs, though maybe I’d be kinda skeptical that people would be spending $10^11 or $10^12 on runs, if their $10^10 runs produced results not qualitatively very different from their $10^9 runs. But IDK, that’s both a sociological question and a question of which lesser capabilities happen to get unlocked at which exact training run sizes given the model architectures in a decade, which of course IDK. So yeah, if it’s 10^30 but not much algorithmic progress, I doubt that gets AGI.
I agree this is a major unsolved problem that will be solved prior to AGI.
However, I still believe “AGI SOON”, mostly because of what you describe as the “inputs argument”.
In particular, there are a lot of things I personally would try if I was trying to solve this problem, but most of them are computationally expensive. I have multiple projects blocked on “This would be cool, but LLMs need to be 100x-1Mx faster for it to be practical.”
This makes it hard for me to believe timelines like “20 or 50 years”, unless you have some private reason to think Moore’s Law/Algorithmic progress will stop. LLM inference, for example, is dropping by 10x/year, and I have no reason to believe this stops anytime soon.
What I mainline expect is that yes, a few OOMs more of compute and efficiency will unlock a bunch of new things to try, and yes some of those things will make some capabilities go up a bunch, in the theme of o3. I just also expect that to level off. I would describe myself as “confident but not extremely confident” of that; like, I give 1 or 2% p(doom) in the next 10ish years, coming from this possibility (and some more p(doom) from other sources). Why expect it to level off? Because I don’t see good evidence of “a thing that wouldn’t level off”; the jump made by LLMs of “now we can leverage huge amounts of data and huge amounts of compute at all rather than not at all” is certainly a jump, but I don’t see why to think it’s a jump to an unbounded trajectory.
I guess I should be more specific.
Do you expect this curve
To flatten, or do you expect that training runs in say 2045 are at say 10^30 flops and have still failed to produce AGI?
My p(AGI by 2045) is higher because there’s been more time for algorithmic progress, maybe in the ballpark of 20%. I don’t have strong opinions about how much people will do huge training runs, though maybe I’d be kinda skeptical that people would be spending $10^11 or $10^12 on runs, if their $10^10 runs produced results not qualitatively very different from their $10^9 runs. But IDK, that’s both a sociological question and a question of which lesser capabilities happen to get unlocked at which exact training run sizes given the model architectures in a decade, which of course IDK. So yeah, if it’s 10^30 but not much algorithmic progress, I doubt that gets AGI.