Yup, that is indisputable. Further, it’s possible that LLMs scale to a transformative technology, to the Singularity, and/or to omnicide. (Though it’s not-that-likely, on my model; I think I still give this cluster of scenarios ~20%.)
Including Towards AGI
I don’t think so. I’m more sure of LLMs not scaling to AGI than ever.
Doubtlessly part of it is a prompting-skill issue on my part. Still, I don’t think the gap between the performance I’ve managed to elicit and the top performance is that big. For one, my experience echoes those of many other mathematicians/software engineers, and also METR’s results (on agency horizons, often-negative productivity effects, and LLM-code-is-not-mergeable).
These things have no clue what’s going on, there’s nobody in there. Whatever algorithms they are running, those are not the algorithms of a general intelligence, and there’s no reason to believe they’re on some sort of “sliding scale” to it.
I’ve still found them useful. If METR’s trend actually holds, they will indeed become increasingly more useful. If it actually holds to >1-month tasks, they may actually become transformative within the decade. Perhaps they will automate the within-paradigm AI R&D[1], and it will lead to a software-only Singularity that will birth an AI model capable of eradicating humanity.
But that thing will still not be an AGI.This would be the face of our extinction:
We should pause to note that a Clippy² still doesn’t really think or plan. It’s not really conscious. It is just an unfathomably vast pile of numbers produced by mindless optimization starting from a small seed program that could be written on a few pages. [...] When it ‘plans’, it would be more accurate to say it fake-plans; when it ‘learns’, it fake-learns; when it ‘thinks’, it is just interpolating between memorized data points in a high-dimensional space, and any interpretation of such fake-thoughts as real thoughts is highly misleading; when it takes ‘actions’, they are fake-actions optimizing a fake-learned fake-world, and are not real actions, any more than the people in a simulated rainstorm really get wet, rather than fake-wet. (The deaths, however, are real.)
This seems unlikely to me on balance. I think compute scaling will run out well before that. I think it’s possible to scale LLMs far enough to achieve this, but that it’s “possible” in a very useless way. A Jupiter Brain-sized LLM can likely do it (and probably just an Earth Brain-sized one), but we are not building a Jupiter Brain-sized LLM.
But maybe I’m wrong; maybe we do have enough compute.
1. Imagine an infinitely large branching lookup table/flowchart. It maps all possible sequences of observations to sequences of actions picked to match the behavior of a general intelligence. Given a hypercomputer to run it, would that thing be effectively an AGI, for all intents and purposes? Sure. But would it actually be an AGI, structurally? Nope.
Remove the hypercomputer assumption and switch an infinitely large flowchart to a merely unfathomably large one. Suddenly the flowchart stops implementing general intelligence exactly, is relegated to an approximation of it. And that approximation is not that good, and rapidly degrades as you scale the available compute down.
Can a Galaxy Brain-scale flowchart like this kill humanity? Maybe, maybe not: combinatorial numbers are larger than astronomical numbers. But there are numbers big enough that a flowchart of that size would be able to ape an AGI’s behavior well enough to paperclip us.
2. Imagine Cyc. It was (is) an effort to build AGI. Its core motivation is as follows: There is no “simple algorithm for intelligence”. Intelligence is a mess of ad-hoc heuristics, and generality/autonomous learning emerges from those heuristics once some critical mass is attained. The way to AGI, then, is to do the hard, dirty work of inputting that critical mass of heuristics into your AI system (instead of lazily hoping for some sort of algorithmic shortcut), and eventually it would take off and start outputting novel discoveries:
The more knowledge Cyc has, the easier it is for Cyc to learn more. At first, it must be spoon-fed knowledge with every entry entered by hand. As it builds a basic understanding of the world, it would be able to parse sentences half-way from natural language to logic, and the ontologists would help finish the job, and the more it knew, the better it could parse, saving more time, until it would start parsing without human help. On that day, the “knowledge pump” would finally, triumphantly, be primed, and Cyc would start pumping and pumping, and more knowledge would just keep pouring out without any exhaustion, ushering a new golden age.
To realize this vision, Cycorp hired tons of domain experts to extract knowledge from:
Cyc would try to solve a problem, and fails by timing out. The ontologists at Cyc would call up a human expert and ask, “How did you do this?” and the expert would explain how they would solve it with quick rules of thumb, which the ontologists would write into Cyc, resulting in more assertions, and possibly more inference engines.
And they scaled it to a pretty ridiculous degree:
The number of assertions grew to 30M, the cost grew to $200M, with 2000 person-years.
And they had various fascinating exponential scaling laws:
[T]he growth of assertions is roughly exponential, doubling every 6 years. At this rate, in 2032 Cyc can expect to reach 100M assertions, the hoped-for point at which Cyc would know as much as a typical human.
But that project seems doomed. Sure, much like the flowchart, if scaled sufficiently far, this AGI-as-coded-by-ancient-Greek-philosophers would approximate a general intelligence well enough to be interesting/dangerous. But it would not have the algorithms of a general intelligence internally, and as you scale the available compute down, the approximation’s quality would degrade rapidly.
A Jupiter Brain-sized Cyc can probably defeat humanity. But Cycorp does not have Jupiter Brain-scale resources.
When we see frontier models improving at various benchmarks we should think not just of increased scale and clever ML research ideas but billions of dollars spent paying PhDs, MDs, and other experts to write questions and provide example answers and reasoning targeting these precise capabilities. With the advent of outcome based RL and the move towards more ‘agentic’ use-cases, this data also includes custom RL environments which are often pixel-perfect replications of commonly used environments such as specific websites like Airbnb or Amazon, browsers, terminals and computer file-systems, and so on alongside large amounts of human trajectories exhaustively covering most common use-cases with these systems.
In a way, this is like a large-scale reprise of the expert systems era, where instead of paying experts to directly program their thinking as code, they provide numerous examples of their reasoning and process formalized and tracked, and then we distill this into models through behavioural cloning.
Indeed, this is exactly like a large-scale reprise of the expert systems era. The same notion that there’s no simple algorithm for intelligence, that it’s just a mess of heuristics; that attaining AGI just requires the “hard work” of scaling compute and data (instead of lazy theorizing about architectures!); the expectation that if they just chisel-in enough domain-specific expertise into DL models, generality would spontaneously emerge; the hiring of experts to extract that knowledge from; the sheer ridiculous scale of the endeavor. The only thing that’s different is handing off the coding to the SGD (which does lead to dramatic efficiency improvements).
Does that paradigm scale, in the limit of infinite compute, to perfectly approximating the external behavior of generally intelligent entities? Yes. But any given LLM, no matter how big, would not be structured as a general intelligence internally, and the approximation’s quality would degrade rapidly as you scale it down.
But how rapidly? A Jupiter Brain-sized LLM can probably kill us. But can an Earth Brain-sized, or, say, a “10% of US’ GDP”-sized LLM, do it?
I don’t know. Maybe, maybe not. But eyeballing the current trends, I expect not.
Now, a fair question to ask here is: does this matter? If LLMs aren’t “real general intelligences”, but it’s still fairly plausible that they’re good-enough AGI approximations to drive humanity extinct, shouldn’t our policy be the same in both cases?
To a large extent, yes. But building gears-level models of this whole thing still seems important.
“Within-paradigm” as in, they will not be able to switch themselves to an innovative neurosymbolic architecture, like IIRC happens in AI-2027. Just speed up the existing algorithmic-efficiency, data-quality, and RL-environment scaling laws.
I think the IMO results strongly suggest that AGI-worthiness of LLMs at current or similar scale will no longer be possible to rule out (with human efforts). Currently absence of continual learning makes them clearly non-AGI, and in-context learning doesn’t necessarily get them there with feasible levels of scaling. But some sort of post-training based continual learning likely won’t need more scale, and the difficulty of figuring it out remains unknown, as it only got in the water supply as an important obstruction this year.
The key things from solving IMO-level problems (doesn’t matter if it’s proper gold or not) is difficulty reasonably close to the limit of human ability in a somewhat general domain, and correctness grading being somewhat vague (natural language proofs, not just answers). Which describes most technical problems, so it’s evidence that for most technical problems of various other kinds similar methods of training are not far off from making LLMs capable of solving them, and that LLMs don’t need much more scale to make that happen. (Perhaps they need a little bit more scale to solve such problems efficiently, without wasting a lot of parallel compute on failed attempts.)
More difficult problems that take a lot of time to solve (and depend on learning novel specialized ideas) need continual learning to tackle them. Currently only in-context learning is a straightforward way of getting there, by using contexts with millions or tens of millions of tokens of tool-using reasoning traces, equivalent to years of working on a problem for a human. This doesn’t work very well, and it’s unclear if it will work well enough within the remaining scaling in the near term, with 5 GW training systems and the subsequent slowdown. But it’s not ruled out that continual learning can be implemented in some other way, by automatically post-training the model, in which case it’s not obvious that there is anything at all left to figure out before LLMs at a scale similar to today’s become AGIs.
The way you’re using this concept is poisoning your mind. Generality of a domain does imply that if you can do all the stuff in that domain, then you are generally capable (and, depending, that could imply general intelligence; e.g. if you’ve ruled out GLUT-like things). But if you can do half of the things in the domain and not the other half, then you have to ask whether you’re exhibiting general competence in that domain, vs. competence in some sub-domain and incompetence in the general domain. Making this inference enthymemically is poisoning your mind.
For example, suppose that X is “self-play”. One important thing about self-play is that it’s an infinite source of data, provided in a sort of curriculum of increasing difficulty and complexity. Since we have the idea of self-play, and we have some examples of self-play that are successful (e.g. AlphaZero), aren’t we most of the way to having the full power of self-play? And isn’t the full power of self-play quite powerful, since it’s how evolution made AGI? I would say “doubtful”. The self-play that evolution uses (and the self-play that human children use) is much richer, containing more structural ideas, than the idea of having an agent play a game against a copy of itself.
Most instances of a category are not the most powerful, most general instances of that category. So just because we have, or will soon have, some useful instances of a category, doesn’t strongly imply that we can or will soon be able to harness most of the power of stuff in that category. I’m reminded of the politician’s syllogism: “We must do something. This is something. Therefore, we must do this.”.
What I meant by general domain is that it’s not overly weird in the mental moves that are relevant there, so training methods that can create something that wins IMO are probably not very different from training methods that can create things that solve many other kinds of problems. It’s still a bit weird, high school math with olympiad addons is still a somewhat narrow toolkit, but for technical problems of many other kinds the mental move toolkits are not qualitatively different, even if they are larger. The claim is that solving IMO is a qualitatively new milestone from the point of view of this framing, it’s evidence about AGI potential of LLMs at the near-current scale in a way that previous results were not.
I agree that there could still be gaps and “generality” of IMO isn’t a totalizing magic that prevents existence of crucial remaining gaps. I’m not strongly claiming there aren’t any crucial gaps, just that with IMO as an example it’s no longer obvious there are any, at least as long as the training methods used for IMO can be adopted to those other areas, which isn’t always obviously the case. And of course continual learning could prove extremely hard. But there also isn’t strong evidence that it’s extremely hard yet, because it wasn’t a focus for very long while LLMs at current levels of capabilities were already available. And the capabilities of in-context learning with 50M token contexts and even larger LLMs haven’t been observed yet.
So it’s a question of calibration. There could always be substantial obstructions such that it’s no longer obvious that they are there even though they are. But also at some point there actually aren’t any. So always suspecting currently unobservable crucial obstructions is not the right heuristic either, the prediction of when the problem could actually be solved needs to be allowed to respond to some sort of observable evidence.
What I meant by general domain is that it’s not overly weird in the mental moves that are relevant there, so training methods that can create something that wins IMO are probably not very different from training methods that can create things that solve many other kinds of problems.
I took you to be saying
math is a general domain
IMO is fairly hard math
LLMs did the IMO
therefore LLMs can do well in a general domain
therefore probably maybe LLMs are generally intelligent.
But maybe you instead meant
working out math problems applying known methods is a general domain
?
Anyway, “general domain” still does not make sense here. The step from 4 to 5 is not supported by this concept of “general domain” as you’re applying it here.
It’s certainly possible—but “efficient continual learning” sounds a lot like AGI! So, to say that is the thing missing for AGI is not such a strong statement about the distance left, is it?
I don’t think this is moving goalposts on the current paradigm. The word “continual” seems to have basically replaced “online” since the rise of LLMs—perhaps because they manage a bit of in-context learning which is sort-of-online but not-quite-continual and makes a distinction necessary. However, “a system that learns efficiently over the course of its lifetime” is basically what we always expected from AGI, e.g. this is roughly what Hofstadter claimed was missing in “Fluid Concepts and Creative Analogies” as far back as 1995.
I agree that we can’t rule out roughly current scale LLMs reaching AGI. I just want to guard against the implication (which others may read into your words) that this is some kind of default expectation.
The question for this subthread is the scale of LLMs necessary for first AGIs, what the IMO results say about that. Continual learning through post-training doesn’t obviously require more scale, and IMO is an argument about the current scale being almost sufficient. It could be very difficult conceptually/algorithmically to figure out how to actually do continual learning with automated post-training, but that still doesn’t need to depend on more scale for the underlying LLM, that’s my point about the implications of the IMO results. Before those results, it was far less clear if the current (or near term feasible) scale would be sufficient for the neural net cognitive engine part of the AGI puzzle.
It could be that LLMs can’t get there at the current scale because LLMs can’t get there at any (potentially physical) scale with the current architecture.
So in some sense yes that wouldn’t be a prototypical example of a scale bottleneck.
I’ve still found them useful. If METR’s trend actually holds, they will indeed become increasingly more useful. If it actually holds to >1-month tasks, they may actually become transformative within the decade. Perhaps they will automate the within-paradigm AI R&D, and it will lead to a software-only Singularity that will birth an AI model capable of eradicating humanity.
But that thing will still not be an AGI. This would be the face of our extinction:
We should pause to note that a Clippy² still doesn’t really think or plan. It’s not really conscious. It is just an unfathomably vast pile of numbers produced by mindless optimization starting from a small seed program that could be written on a few pages. [...] When it ‘plans’, it would be more accurate to say it fake-plans; when it ‘learns’, it fake-learns; when it ‘thinks’, it is just interpolating between memorized data points in a high-dimensional space, and any interpretation of such fake-thoughts as real thoughts is highly misleading; when it takes ‘actions’, they are fake-actions optimizing a fake-learned fake-world, and are not real actions, any more than the people in a simulated rainstorm really get wet, rather than fake-wet. (The deaths, however, are real.)
This seems unlikely to me on balance. I think compute scaling will run out well before that. I think it’s possible to scale LLMs far enough to achieve this, but that it’s “possible” in a very useless way. A Jupiter Brain-sized LLM can likely do it (and probably just an Earth Brain-sized one), but we are not building a Jupiter Brain-sized LLM.
Uh… what? Why do you define “AGI” through its internals, and not through its capabilities? That seems to be a very strange standard, and an unhelpful one. If I didn’t have more context I’d be suspecting you of weird goalpost-moving. I personally care whether
AI systems are created that lead to human extinction, broadly construed, and
Those AI systems then, after leading to human extinction, fail to self-sustain and “go extinct” themselves
Maybe you were gesturing at AIs that result in both (1) and (2)??
And the whole reason why we talk about AGI and ASI so much here on Less Wrong dot com is because those AI systems could lead to drastic changes of the future of the universe. Otherwise we wouldn’t really be interested in them, and go back to arguing about anthropics or whatever.
Whether some system is “real” AGI based on its internals is not relevant to this question. (The internals of AI systems are of course interesting in themselves, and for many other reasons.)
(As such, I read that paragraph by gwern to be sarcastic, and mocking people who insist that it’s “not really AGI” if it doesn’t function in the way they believe it should work.)
Now, a fair question to ask here is: does this matter? If LLMs aren’t “real general intelligences”, but it’s still fairly plausible that they’re good-enough AGI approximations to drive humanity extinct, shouldn’t our policy be the same in both cases?
I think if the lightcone looks the same, it should, if it doesn’t, our policies should look different. It would matter if the resulting AIs fall over and leave the lightcone in the primordial state, which looks plausible from your view?
Why do you define “AGI” through its internals, and not through its capabilities?
For the same reason we make a distinction between the Taylor-polynomial approximation of a function, and that function itself? There is a “correct” algorithm for general intelligence, which e. g. humans use, and there are various ways to approximate it. Approximants would behave differently from the “real thing” in various ways, and it’s important to track which one you’re looking at.
It would matter if the resulting AIs fall over and leave the lightcone in the primordial state, which looks plausible from your view?
That’s not the primary difference, in my view. My view is simply that there’s a difference between an “exact AGI” and AGI-approximants at all. That difference can be characterized as “whether various agent-foundations results apply to that system universally / predict its behavior universally, or only in some limited contexts”. Is it well-modeld as an agent, or is modeling it as an agent itself a drastically flawed approximation of its behavior that breaks down in some contexts?
And yes, “falls over after the omnicide” is a plausible way for AGI-approximants to differ from an actual AGI. Say, their behavior can be coherent at timescales long enough to plot humanity’s demise (e. g., a millennium), but become incoherent at 100x that time-scale. So after a million years, their behavior devolves into noise, and they self-destruct or just enter a stasis. More likely, I would expect that if a civilization of those things ever runs into an actual general superintelligence (e. g., an alien one), the “real deal” would rip through them like it would through humanity (even if they have a massive resource advantage, e. g. 1000x more galaxies under their control).
Some other possible differences:
Capability level as a function of compute. Approximants can be drastically less compute-efficient, in a quantity-is-a-quality-of-its-own way.
Generality. Approximants can be much “spikier” in their capabilities than general intelligences, genius-level in some domains/at some problems and below-toddler in other cases, in a way that is random, or fully determined by their training data.
Autonomy. Approximants can be incapable of autonomously extending their agency horizons, or generalizing to new domains, or deriving new domains (“innovating”) without an “actual” general intelligence to guide them.
(Which doesn’t necessarily prevent them from destroying humanity. Do you need to “innovate” to do so? I don’t know that you do.)
None of this may matter for the question of “can this thing kill humanity?”. But, like, I think there’s a real difference there that cuts reality at the joints, and tracking it is important.
In particular, it’s important for the purposes of evaluating whether we in fact do live in a world where a specific AGI-approximant scales to omnicide given realistic quantities of compute, or whether we don’t.
For example, consider @Daniel Kokotajlo’s expectation that METR’s task horizons would go superexponential at some point. I think it’s quite likely if the LLM paradigm scales to an actual AGI, if it is really acquiring the same generality/agency skills humans have. If not, however, if it is merely some domain-limited approximation of AGI, it may stay at the current trend forever (or, well, as long as the inputs keep scaling the same way), no matter how counter-intuitive that may feel.
Indeed, “how much can we rely on our intuitions regarding agents/humans when making predictions about a given AI paradigm?” may be a good way to characterize the difference here. Porting intuitions about agents lets us make predictions like “superexponential task-horizon scaling” and “the sharp left turn” and “obviously the X amount of compute and data ought to be enough for superintelligence”. But if the underlying system is not a “real” agent in many kinds of ways, those predictions would start becoming suspect. (And I would say the current paradigm’s trajectory has been quite counter-intuitive, from this perspective. My intuitions broke around GPT-3.5, and my current models are an attempt to deal with that.)
This, again, may not actually bear on the question of whether the given system scales to omnicide. The AGI approximant’s scaling laws may still be powerful enough. But it determines what hypotheses we should be tracking about this, what observations we should be looking for, and how we should update on them.
And the whole reason why we talk about AGI and ASI so much here on Less Wrong dot com is because those AI systems could lead to drastic changes of the future of the universe.
Things worth talking about are the things that can lead to drastic changes of the future of the universe, yes. And AGI is one of those things, so we should talk about it. But I think defining “is an AGI” as “any system that can lead to drastic changes of the future of the universe” is silly, no? Words mean things. A sufficiently big antimatter bomb is not an AGI, superviruses are not AGIs, and LLMs can be omnicide-capable (and therefore worth talking about) without being AGIs as well.
For AGIs and agents, many approximations are interchangeable with the real thing, because they are capable of creating the real thing in the world as a separate construct, or converging to it in behavior. Human decisions for example are noisy and imprecise, but for mathematical or engineering questions it’s possible to converge on arbitrary certainty and precision. In a similar way, humans are approximations to superintelligence, even though not themselves superintelligence.
Thus many AGI-approximations may be capable of becoming or creating the real thing eventually. Not everything converges, but the distinction should be about that, not about already being there.
Right, this helps. I guess I don’t want to fight about definitions here. I’d just say “ah, any software that you can run on computers that can cause the extinction of humanity even if humans try to prevent it” would fulfill the sufficiency criterion for AGIniplav, and then there’s different classes of algorithms/learners/architectures that fulfill that criterion, and have different properties.
(I wouldn’t even say that “can omnicide us” is necessary for AGIniplav membership—”my AGI timelines are −3 years”30%.)
One crux here may be that you are more certain that “AGI” is a thing? My intuition goes more in the direction of “there’s tons of different cognitive algorithms, with different properties, among the computable ones they’re on a high-dimensional set of spectra, some of which in aggregate may be called ‘generality’.”
I think no free lunch theorems point at this, as well as the conclusions from this post. Solomonoff inductors’ beliefs look like they’d look messy and noisy, and current neural networks look messy and noisy too. I personally would find it more beautiful and nice if Thinking was a Thing, but I’ve received more evidence I interpret as “it’s actually not”.
But my questions have been answered to the degree I wanted them answered, thanks :-)
ah, any software that you can run on computers that can cause the extinction of humanity even if humans try to prevent it would fulfill the sufficiency criterion for AGIniplav
A flight control program directing an asteroid redirection rocket, programmed to find a large asteroid and steer it to crash into Earth seems like the sort of thing which could be “software that you can run on computers that can cause the extinction of humanity” but not “AGI”.
I think it’s relevant that “kill all humans” is a much easier target than “kill all humans in such a way that you can persist and grow indefinitely without them”.
I’d just say “ah, any software that you can run on computers that can cause the extinction of humanity even if humans try to prevent it” would fulfill the sufficiency criterion for AGIniplav
Fair.
One crux here may be that you are more certain that “AGI” is a thing?
I’ve still found them useful. If METR’s trend actually holds, they will indeed become increasingly more useful. If it actually holds to >1-month tasks, they may actually become transformative within the decade. Perhaps they will automate the within-paradigm AI R&D[1], and it will lead to a software-only Singularity that will birth an AI model capable of eradicating humanity.
But that thing will still not be an AGI.
No offense, but to me it seems like you are being overly pedantic with a term that most people use differently. If you surveyed people on lesswrong, as well as AI researchers, I’m pretty sure almost everyone (>90% of people) would call an AI model capable enough to eradicate humanity an AGI.
If you surveyed people on lesswrong, as well as AI researchers, I’m pretty sure almost everyone (>90% of people) would call an AI model capable enough to eradicate humanity an AGI
Well, if so, I think they would be very wrong. See a more in-depth response here.
AI discourse doesn’t get enough TsviBT-like vitamins, so their projected toxicity if overdosed is not relevant. A lot of interventions are good in moderation, so arguments about harm from saturation are often counterproductive if taken as a call to any sort of immediately relevant action rather than theoretical notes about hypothetical future conditions.
I disagree with this, and in particular the move by TsviBT of arguing that today’s AI has basically zero relevance to what AGI needs to have/claiming that LP25 programs aren’t actually creative, and more generally setting up a hard border between today’s AI and AGI is a huge amount of AI discourse, especially on claims that AI will soon hit a wall for xyz reasons.
Yup, that is indisputable. Further, it’s possible that LLMs scale to a transformative technology, to the Singularity, and/or to omnicide. (Though it’s not-that-likely, on my model; I think I still give this cluster of scenarios ~20%.)
I don’t think so. I’m more sure of LLMs not scaling to AGI than ever.
Over the last few months, I’ve made concentrated efforts to get firsthand experience of frontier LLMs, for math-research assistance (o3) and coding (Opus 4.1). LLMs’ capabilities turned out to be below my previous expectations, precisely along the dimension of agency/”actual understanding”.
Doubtlessly part of it is a prompting-skill issue on my part. Still, I don’t think the gap between the performance I’ve managed to elicit and the top performance is that big. For one, my experience echoes those of many other mathematicians/software engineers, and also METR’s results (on agency horizons, often-negative productivity effects, and LLM-code-is-not-mergeable).
These things have no clue what’s going on, there’s nobody in there. Whatever algorithms they are running, those are not the algorithms of a general intelligence, and there’s no reason to believe they’re on some sort of “sliding scale” to it.
I’ve still found them useful. If METR’s trend actually holds, they will indeed become increasingly more useful. If it actually holds to >1-month tasks, they may actually become transformative within the decade. Perhaps they will automate the within-paradigm AI R&D[1], and it will lead to a software-only Singularity that will birth an AI model capable of eradicating humanity.
But that thing will still not be an AGI. This would be the face of our extinction:
This seems unlikely to me on balance. I think compute scaling will run out well before that. I think it’s possible to scale LLMs far enough to achieve this, but that it’s “possible” in a very useless way. A Jupiter Brain-sized LLM can likely do it (and probably just an Earth Brain-sized one), but we are not building a Jupiter Brain-sized LLM.
But maybe I’m wrong; maybe we do have enough compute.
1. Imagine an infinitely large branching lookup table/flowchart. It maps all possible sequences of observations to sequences of actions picked to match the behavior of a general intelligence. Given a hypercomputer to run it, would that thing be effectively an AGI, for all intents and purposes? Sure. But would it actually be an AGI, structurally? Nope.
Remove the hypercomputer assumption and switch an infinitely large flowchart to a merely unfathomably large one. Suddenly the flowchart stops implementing general intelligence exactly, is relegated to an approximation of it. And that approximation is not that good, and rapidly degrades as you scale the available compute down.
Can a Galaxy Brain-scale flowchart like this kill humanity? Maybe, maybe not: combinatorial numbers are larger than astronomical numbers. But there are numbers big enough that a flowchart of that size would be able to ape an AGI’s behavior well enough to paperclip us.
2. Imagine Cyc. It was (is) an effort to build AGI. Its core motivation is as follows: There is no “simple algorithm for intelligence”. Intelligence is a mess of ad-hoc heuristics, and generality/autonomous learning emerges from those heuristics once some critical mass is attained. The way to AGI, then, is to do the hard, dirty work of inputting that critical mass of heuristics into your AI system (instead of lazily hoping for some sort of algorithmic shortcut), and eventually it would take off and start outputting novel discoveries:
To realize this vision, Cycorp hired tons of domain experts to extract knowledge from:
And they scaled it to a pretty ridiculous degree:
And they had various fascinating exponential scaling laws:
But that project seems doomed. Sure, much like the flowchart, if scaled sufficiently far, this AGI-as-coded-by-ancient-Greek-philosophers would approximate a general intelligence well enough to be interesting/dangerous. But it would not have the algorithms of a general intelligence internally, and as you scale the available compute down, the approximation’s quality would degrade rapidly.
A Jupiter Brain-sized Cyc can probably defeat humanity. But Cycorp does not have Jupiter Brain-scale resources.
3. Imagine the Deep Learning industry:
Indeed, this is exactly like a large-scale reprise of the expert systems era. The same notion that there’s no simple algorithm for intelligence, that it’s just a mess of heuristics; that attaining AGI just requires the “hard work” of scaling compute and data (instead of lazy theorizing about architectures!); the expectation that if they just chisel-in enough domain-specific expertise into DL models, generality would spontaneously emerge; the hiring of experts to extract that knowledge from; the sheer ridiculous scale of the endeavor. The only thing that’s different is handing off the coding to the SGD (which does lead to dramatic efficiency improvements).
Does that paradigm scale, in the limit of infinite compute, to perfectly approximating the external behavior of generally intelligent entities? Yes. But any given LLM, no matter how big, would not be structured as a general intelligence internally, and the approximation’s quality would degrade rapidly as you scale it down.
But how rapidly? A Jupiter Brain-sized LLM can probably kill us. But can an Earth Brain-sized, or, say, a “10% of US’ GDP”-sized LLM, do it?
I don’t know. Maybe, maybe not. But eyeballing the current trends, I expect not.
Now, a fair question to ask here is: does this matter? If LLMs aren’t “real general intelligences”, but it’s still fairly plausible that they’re good-enough AGI approximations to drive humanity extinct, shouldn’t our policy be the same in both cases?
To a large extent, yes. But building gears-level models of this whole thing still seems important.
“Within-paradigm” as in, they will not be able to switch themselves to an innovative neurosymbolic architecture, like IIRC happens in AI-2027. Just speed up the existing algorithmic-efficiency, data-quality, and RL-environment scaling laws.
I think the IMO results strongly suggest that AGI-worthiness of LLMs at current or similar scale will no longer be possible to rule out (with human efforts). Currently absence of continual learning makes them clearly non-AGI, and in-context learning doesn’t necessarily get them there with feasible levels of scaling. But some sort of post-training based continual learning likely won’t need more scale, and the difficulty of figuring it out remains unknown, as it only got in the water supply as an important obstruction this year.
I’m not so confident about this.
It seems to me that IMO problems are not so representative of real-world tasks faced by human level agents.
The key things from solving IMO-level problems (doesn’t matter if it’s proper gold or not) is difficulty reasonably close to the limit of human ability in a somewhat general domain, and correctness grading being somewhat vague (natural language proofs, not just answers). Which describes most technical problems, so it’s evidence that for most technical problems of various other kinds similar methods of training are not far off from making LLMs capable of solving them, and that LLMs don’t need much more scale to make that happen. (Perhaps they need a little bit more scale to solve such problems efficiently, without wasting a lot of parallel compute on failed attempts.)
More difficult problems that take a lot of time to solve (and depend on learning novel specialized ideas) need continual learning to tackle them. Currently only in-context learning is a straightforward way of getting there, by using contexts with millions or tens of millions of tokens of tool-using reasoning traces, equivalent to years of working on a problem for a human. This doesn’t work very well, and it’s unclear if it will work well enough within the remaining scaling in the near term, with 5 GW training systems and the subsequent slowdown. But it’s not ruled out that continual learning can be implemented in some other way, by automatically post-training the model, in which case it’s not obvious that there is anything at all left to figure out before LLMs at a scale similar to today’s become AGIs.
The way you’re using this concept is poisoning your mind. Generality of a domain does imply that if you can do all the stuff in that domain, then you are generally capable (and, depending, that could imply general intelligence; e.g. if you’ve ruled out GLUT-like things). But if you can do half of the things in the domain and not the other half, then you have to ask whether you’re exhibiting general competence in that domain, vs. competence in some sub-domain and incompetence in the general domain. Making this inference enthymemically is poisoning your mind.
Cf. https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce#_We_just_need_X__intuitions :
What I meant by general domain is that it’s not overly weird in the mental moves that are relevant there, so training methods that can create something that wins IMO are probably not very different from training methods that can create things that solve many other kinds of problems. It’s still a bit weird, high school math with olympiad addons is still a somewhat narrow toolkit, but for technical problems of many other kinds the mental move toolkits are not qualitatively different, even if they are larger. The claim is that solving IMO is a qualitatively new milestone from the point of view of this framing, it’s evidence about AGI potential of LLMs at the near-current scale in a way that previous results were not.
I agree that there could still be gaps and “generality” of IMO isn’t a totalizing magic that prevents existence of crucial remaining gaps. I’m not strongly claiming there aren’t any crucial gaps, just that with IMO as an example it’s no longer obvious there are any, at least as long as the training methods used for IMO can be adopted to those other areas, which isn’t always obviously the case. And of course continual learning could prove extremely hard. But there also isn’t strong evidence that it’s extremely hard yet, because it wasn’t a focus for very long while LLMs at current levels of capabilities were already available. And the capabilities of in-context learning with 50M token contexts and even larger LLMs haven’t been observed yet.
So it’s a question of calibration. There could always be substantial obstructions such that it’s no longer obvious that they are there even though they are. But also at some point there actually aren’t any. So always suspecting currently unobservable crucial obstructions is not the right heuristic either, the prediction of when the problem could actually be solved needs to be allowed to respond to some sort of observable evidence.
I took you to be saying
math is a general domain
IMO is fairly hard math
LLMs did the IMO
therefore LLMs can do well in a general domain
therefore probably maybe LLMs are generally intelligent.
But maybe you instead meant
working out math problems applying known methods is a general domain
?
Anyway, “general domain” still does not make sense here. The step from 4 to 5 is not supported by this concept of “general domain” as you’re applying it here.
It’s certainly possible—but “efficient continual learning” sounds a lot like AGI! So, to say that is the thing missing for AGI is not such a strong statement about the distance left, is it?
I don’t think this is moving goalposts on the current paradigm. The word “continual” seems to have basically replaced “online” since the rise of LLMs—perhaps because they manage a bit of in-context learning which is sort-of-online but not-quite-continual and makes a distinction necessary. However, “a system that learns efficiently over the course of its lifetime” is basically what we always expected from AGI, e.g. this is roughly what Hofstadter claimed was missing in “Fluid Concepts and Creative Analogies” as far back as 1995.
I agree that we can’t rule out roughly current scale LLMs reaching AGI. I just want to guard against the implication (which others may read into your words) that this is some kind of default expectation.
The question for this subthread is the scale of LLMs necessary for first AGIs, what the IMO results say about that. Continual learning through post-training doesn’t obviously require more scale, and IMO is an argument about the current scale being almost sufficient. It could be very difficult conceptually/algorithmically to figure out how to actually do continual learning with automated post-training, but that still doesn’t need to depend on more scale for the underlying LLM, that’s my point about the implications of the IMO results. Before those results, it was far less clear if the current (or near term feasible) scale would be sufficient for the neural net cognitive engine part of the AGI puzzle.
It could be that LLMs can’t get there at the current scale because LLMs can’t get there at any (potentially physical) scale with the current architecture.
So in some sense yes that wouldn’t be a prototypical example of a scale bottleneck.
I think this is very cogently written and I found it pretty persuasive even though our views started out fairly similar.
Uh… what? Why do you define “AGI” through its internals, and not through its capabilities? That seems to be a very strange standard, and an unhelpful one. If I didn’t have more context I’d be suspecting you of weird goalpost-moving. I personally care whether
AI systems are created that lead to human extinction, broadly construed, and
Those AI systems then, after leading to human extinction, fail to self-sustain and “go extinct” themselves
Maybe you were gesturing at AIs that result in both (1) and (2)??
And the whole reason why we talk about AGI and ASI so much here on Less Wrong dot com is because those AI systems could lead to drastic changes of the future of the universe. Otherwise we wouldn’t really be interested in them, and go back to arguing about anthropics or whatever.
Whether some system is “real” AGI based on its internals is not relevant to this question. (The internals of AI systems are of course interesting in themselves, and for many other reasons.)
(As such, I read that paragraph by gwern to be sarcastic, and mocking people who insist that it’s “not really AGI” if it doesn’t function in the way they believe it should work.)
I think if the lightcone looks the same, it should, if it doesn’t, our policies should look different. It would matter if the resulting AIs fall over and leave the lightcone in the primordial state, which looks plausible from your view?
For the same reason we make a distinction between the Taylor-polynomial approximation of a function, and that function itself? There is a “correct” algorithm for general intelligence, which e. g. humans use, and there are various ways to approximate it. Approximants would behave differently from the “real thing” in various ways, and it’s important to track which one you’re looking at.
That’s not the primary difference, in my view. My view is simply that there’s a difference between an “exact AGI” and AGI-approximants at all. That difference can be characterized as “whether various agent-foundations results apply to that system universally / predict its behavior universally, or only in some limited contexts”. Is it well-modeld as an agent, or is modeling it as an agent itself a drastically flawed approximation of its behavior that breaks down in some contexts?
And yes, “falls over after the omnicide” is a plausible way for AGI-approximants to differ from an actual AGI. Say, their behavior can be coherent at timescales long enough to plot humanity’s demise (e. g., a millennium), but become incoherent at 100x that time-scale. So after a million years, their behavior devolves into noise, and they self-destruct or just enter a stasis. More likely, I would expect that if a civilization of those things ever runs into an actual general superintelligence (e. g., an alien one), the “real deal” would rip through them like it would through humanity (even if they have a massive resource advantage, e. g. 1000x more galaxies under their control).
Some other possible differences:
Capability level as a function of compute. Approximants can be drastically less compute-efficient, in a quantity-is-a-quality-of-its-own way.
Generality. Approximants can be much “spikier” in their capabilities than general intelligences, genius-level in some domains/at some problems and below-toddler in other cases, in a way that is random, or fully determined by their training data.
Approximants’ development may lack a sharp left turn.
Autonomy. Approximants can be incapable of autonomously extending their agency horizons, or generalizing to new domains, or deriving new domains (“innovating”) without an “actual” general intelligence to guide them.
(Which doesn’t necessarily prevent them from destroying humanity. Do you need to “innovate” to do so? I don’t know that you do.)
None of this may matter for the question of “can this thing kill humanity?”. But, like, I think there’s a real difference there that cuts reality at the joints, and tracking it is important.
In particular, it’s important for the purposes of evaluating whether we in fact do live in a world where a specific AGI-approximant scales to omnicide given realistic quantities of compute, or whether we don’t.
For example, consider @Daniel Kokotajlo’s expectation that METR’s task horizons would go superexponential at some point. I think it’s quite likely if the LLM paradigm scales to an actual AGI, if it is really acquiring the same generality/agency skills humans have. If not, however, if it is merely some domain-limited approximation of AGI, it may stay at the current trend forever (or, well, as long as the inputs keep scaling the same way), no matter how counter-intuitive that may feel.
Indeed, “how much can we rely on our intuitions regarding agents/humans when making predictions about a given AI paradigm?” may be a good way to characterize the difference here. Porting intuitions about agents lets us make predictions like “superexponential task-horizon scaling” and “the sharp left turn” and “obviously the X amount of compute and data ought to be enough for superintelligence”. But if the underlying system is not a “real” agent in many kinds of ways, those predictions would start becoming suspect. (And I would say the current paradigm’s trajectory has been quite counter-intuitive, from this perspective. My intuitions broke around GPT-3.5, and my current models are an attempt to deal with that.)
This, again, may not actually bear on the question of whether the given system scales to omnicide. The AGI approximant’s scaling laws may still be powerful enough. But it determines what hypotheses we should be tracking about this, what observations we should be looking for, and how we should update on them.
Things worth talking about are the things that can lead to drastic changes of the future of the universe, yes. And AGI is one of those things, so we should talk about it. But I think defining “is an AGI” as “any system that can lead to drastic changes of the future of the universe” is silly, no? Words mean things. A sufficiently big antimatter bomb is not an AGI, superviruses are not AGIs, and LLMs can be omnicide-capable (and therefore worth talking about) without being AGIs as well.
For AGIs and agents, many approximations are interchangeable with the real thing, because they are capable of creating the real thing in the world as a separate construct, or converging to it in behavior. Human decisions for example are noisy and imprecise, but for mathematical or engineering questions it’s possible to converge on arbitrary certainty and precision. In a similar way, humans are approximations to superintelligence, even though not themselves superintelligence.
Thus many AGI-approximations may be capable of becoming or creating the real thing eventually. Not everything converges, but the distinction should be about that, not about already being there.
Right, this helps. I guess I don’t want to fight about definitions here. I’d just say “ah, any software that you can run on computers that can cause the extinction of humanity even if humans try to prevent it” would fulfill the sufficiency criterion for AGIniplav, and then there’s different classes of algorithms/learners/architectures that fulfill that criterion, and have different properties.
(I wouldn’t even say that “can omnicide us” is necessary for AGIniplav membership—”my AGI timelines are −3 years”30%.)
One crux here may be that you are more certain that “AGI” is a thing? My intuition goes more in the direction of “there’s tons of different cognitive algorithms, with different properties, among the computable ones they’re on a high-dimensional set of spectra, some of which in aggregate may be called ‘generality’.”
I think no free lunch theorems point at this, as well as the conclusions from this post. Solomonoff inductors’ beliefs look like they’d look messy and noisy, and current neural networks look messy and noisy too. I personally would find it more beautiful and nice if Thinking was a Thing, but I’ve received more evidence I interpret as “it’s actually not”.
But my questions have been answered to the degree I wanted them answered, thanks :-)
A flight control program directing an asteroid redirection rocket, programmed to find a large asteroid and steer it to crash into Earth seems like the sort of thing which could be “software that you can run on computers that can cause the extinction of humanity” but not “AGI”.
I think it’s relevant that “kill all humans” is a much easier target than “kill all humans in such a way that you can persist and grow indefinitely without them”.
Yes, and this might be a crux between “successionists” and “doomers” with highly cosmopolitan values.
Fair.
Yup.
No offense, but to me it seems like you are being overly pedantic with a term that most people use differently. If you surveyed people on lesswrong, as well as AI researchers, I’m pretty sure almost everyone (>90% of people) would call an AI model capable enough to eradicate humanity an AGI.
Well, if so, I think they would be very wrong. See a more in-depth response here.
Yeah, this seems like a standard dispute over words, which the sequence posts Disguised Queries and Disputing Definitions already solved.
I’ll also link my own comment on how what TsviBT is doing is making AI discourse worse, because it promotes the incorrect binary frame and dispromotes the correct continuous frame around AI progress.
AI discourse doesn’t get enough TsviBT-like vitamins, so their projected toxicity if overdosed is not relevant. A lot of interventions are good in moderation, so arguments about harm from saturation are often counterproductive if taken as a call to any sort of immediately relevant action rather than theoretical notes about hypothetical future conditions.
I disagree with this, and in particular the move by TsviBT of arguing that today’s AI has basically zero relevance to what AGI needs to have/claiming that LP25 programs aren’t actually creative, and more generally setting up a hard border between today’s AI and AGI is a huge amount of AI discourse, especially on claims that AI will soon hit a wall for xyz reasons.