I’ve still found them useful. If METR’s trend actually holds, they will indeed become increasingly more useful. If it actually holds to >1-month tasks, they may actually become transformative within the decade. Perhaps they will automate the within-paradigm AI R&D, and it will lead to a software-only Singularity that will birth an AI model capable of eradicating humanity.
But that thing will still not be an AGI. This would be the face of our extinction:
We should pause to note that a Clippy² still doesn’t really think or plan. It’s not really conscious. It is just an unfathomably vast pile of numbers produced by mindless optimization starting from a small seed program that could be written on a few pages. [...] When it ‘plans’, it would be more accurate to say it fake-plans; when it ‘learns’, it fake-learns; when it ‘thinks’, it is just interpolating between memorized data points in a high-dimensional space, and any interpretation of such fake-thoughts as real thoughts is highly misleading; when it takes ‘actions’, they are fake-actions optimizing a fake-learned fake-world, and are not real actions, any more than the people in a simulated rainstorm really get wet, rather than fake-wet. (The deaths, however, are real.)
This seems unlikely to me on balance. I think compute scaling will run out well before that. I think it’s possible to scale LLMs far enough to achieve this, but that it’s “possible” in a very useless way. A Jupiter Brain-sized LLM can likely do it (and probably just an Earth Brain-sized one), but we are not building a Jupiter Brain-sized LLM.
Uh… what? Why do you define “AGI” through its internals, and not through its capabilities? That seems to be a very strange standard, and an unhelpful one. If I didn’t have more context I’d be suspecting you of weird goalpost-moving. I personally care whether
AI systems are created that lead to human extinction, broadly construed, and
Those AI systems then, after leading to human extinction, fail to self-sustain and “go extinct” themselves
Maybe you were gesturing at AIs that result in both (1) and (2)??
And the whole reason why we talk about AGI and ASI so much here on Less Wrong dot com is because those AI systems could lead to drastic changes of the future of the universe. Otherwise we wouldn’t really be interested in them, and go back to arguing about anthropics or whatever.
Whether some system is “real” AGI based on its internals is not relevant to this question. (The internals of AI systems are of course interesting in themselves, and for many other reasons.)
(As such, I read that paragraph by gwern to be sarcastic, and mocking people who insist that it’s “not really AGI” if it doesn’t function in the way they believe it should work.)
Now, a fair question to ask here is: does this matter? If LLMs aren’t “real general intelligences”, but it’s still fairly plausible that they’re good-enough AGI approximations to drive humanity extinct, shouldn’t our policy be the same in both cases?
I think if the lightcone looks the same, it should, if it doesn’t, our policies should look different. It would matter if the resulting AIs fall over and leave the lightcone in the primordial state, which looks plausible from your view?
Why do you define “AGI” through its internals, and not through its capabilities?
For the same reason we make a distinction between the Taylor-polynomial approximation of a function, and that function itself? There is a “correct” algorithm for general intelligence, which e. g. humans use, and there are various ways to approximate it. Approximants would behave differently from the “real thing” in various ways, and it’s important to track which one you’re looking at.
It would matter if the resulting AIs fall over and leave the lightcone in the primordial state, which looks plausible from your view?
That’s not the primary difference, in my view. My view is simply that there’s a difference between an “exact AGI” and AGI-approximants at all. That difference can be characterized as “whether various agent-foundations results apply to that system universally / predict its behavior universally, or only in some limited contexts”. Is it well-modeld as an agent, or is modeling it as an agent itself a drastically flawed approximation of its behavior that breaks down in some contexts?
And yes, “falls over after the omnicide” is a plausible way for AGI-approximants to differ from an actual AGI. Say, their behavior can be coherent at timescales long enough to plot humanity’s demise (e. g., a millennium), but become incoherent at 100x that time-scale. So after a million years, their behavior devolves into noise, and they self-destruct or just enter a stasis. More likely, I would expect that if a civilization of those things ever runs into an actual general superintelligence (e. g., an alien one), the “real deal” would rip through them like it would through humanity (even if they have a massive resource advantage, e. g. 1000x more galaxies under their control).
Some other possible differences:
Capability level as a function of compute. Approximants can be drastically less compute-efficient, in a quantity-is-a-quality-of-its-own way.
Generality. Approximants can be much “spikier” in their capabilities than general intelligences, genius-level in some domains/at some problems and below-toddler in other cases, in a way that is random, or fully determined by their training data.
Autonomy. Approximants can be incapable of autonomously extending their agency horizons, or generalizing to new domains, or deriving new domains (“innovating”) without an “actual” general intelligence to guide them.
(Which doesn’t necessarily prevent them from destroying humanity. Do you need to “innovate” to do so? I don’t know that you do.)
None of this may matter for the question of “can this thing kill humanity?”. But, like, I think there’s a real difference there that cuts reality at the joints, and tracking it is important.
In particular, it’s important for the purposes of evaluating whether we in fact do live in a world where a specific AGI-approximant scales to omnicide given realistic quantities of compute, or whether we don’t.
For example, consider @Daniel Kokotajlo’s expectation that METR’s task horizons would go superexponential at some point. I think it’s quite likely if the LLM paradigm scales to an actual AGI, if it is really acquiring the same generality/agency skills humans have. If not, however, if it is merely some domain-limited approximation of AGI, it may stay at the current trend forever (or, well, as long as the inputs keep scaling the same way), no matter how counter-intuitive that may feel.
Indeed, “how much can we rely on our intuitions regarding agents/humans when making predictions about a given AI paradigm?” may be a good way to characterize the difference here. Porting intuitions about agents lets us make predictions like “superexponential task-horizon scaling” and “the sharp left turn” and “obviously the X amount of compute and data ought to be enough for superintelligence”. But if the underlying system is not a “real” agent in many kinds of ways, those predictions would start becoming suspect. (And I would say the current paradigm’s trajectory has been quite counter-intuitive, from this perspective. My intuitions broke around GPT-3.5, and my current models are an attempt to deal with that.)
This, again, may not actually bear on the question of whether the given system scales to omnicide. The AGI approximant’s scaling laws may still be powerful enough. But it determines what hypotheses we should be tracking about this, what observations we should be looking for, and how we should update on them.
And the whole reason why we talk about AGI and ASI so much here on Less Wrong dot com is because those AI systems could lead to drastic changes of the future of the universe.
Things worth talking about are the things that can lead to drastic changes of the future of the universe, yes. And AGI is one of those things, so we should talk about it. But I think defining “is an AGI” as “any system that can lead to drastic changes of the future of the universe” is silly, no? Words mean things. A sufficiently big antimatter bomb is not an AGI, superviruses are not AGIs, and LLMs can be omnicide-capable (and therefore worth talking about) without being AGIs as well.
For AGIs and agents, many approximations are interchangeable with the real thing, because they are capable of creating the real thing in the world as a separate construct, or converging to it in behavior. Human decisions for example are noisy and imprecise, but for mathematical or engineering questions it’s possible to converge on arbitrary certainty and precision. In a similar way, humans are approximations to superintelligence, even though not themselves superintelligence.
Thus many AGI-approximations may be capable of becoming or creating the real thing eventually. Not everything converges, but the distinction should be about that, not about already being there.
Right, this helps. I guess I don’t want to fight about definitions here. I’d just say “ah, any software that you can run on computers that can cause the extinction of humanity even if humans try to prevent it” would fulfill the sufficiency criterion for AGIniplav, and then there’s different classes of algorithms/learners/architectures that fulfill that criterion, and have different properties.
(I wouldn’t even say that “can omnicide us” is necessary for AGIniplav membership—”my AGI timelines are −3 years”30%.)
One crux here may be that you are more certain that “AGI” is a thing? My intuition goes more in the direction of “there’s tons of different cognitive algorithms, with different properties, among the computable ones they’re on a high-dimensional set of spectra, some of which in aggregate may be called ‘generality’.”
I think no free lunch theorems point at this, as well as the conclusions from this post. Solomonoff inductors’ beliefs look like they’d look messy and noisy, and current neural networks look messy and noisy too. I personally would find it more beautiful and nice if Thinking was a Thing, but I’ve received more evidence I interpret as “it’s actually not”.
But my questions have been answered to the degree I wanted them answered, thanks :-)
ah, any software that you can run on computers that can cause the extinction of humanity even if humans try to prevent it would fulfill the sufficiency criterion for AGIniplav
A flight control program directing an asteroid redirection rocket, programmed to find a large asteroid and steer it to crash into Earth seems like the sort of thing which could be “software that you can run on computers that can cause the extinction of humanity” but not “AGI”.
I think it’s relevant that “kill all humans” is a much easier target than “kill all humans in such a way that you can persist and grow indefinitely without them”.
I’d just say “ah, any software that you can run on computers that can cause the extinction of humanity even if humans try to prevent it” would fulfill the sufficiency criterion for AGIniplav
Fair.
One crux here may be that you are more certain that “AGI” is a thing?
Uh… what? Why do you define “AGI” through its internals, and not through its capabilities? That seems to be a very strange standard, and an unhelpful one. If I didn’t have more context I’d be suspecting you of weird goalpost-moving. I personally care whether
AI systems are created that lead to human extinction, broadly construed, and
Those AI systems then, after leading to human extinction, fail to self-sustain and “go extinct” themselves
Maybe you were gesturing at AIs that result in both (1) and (2)??
And the whole reason why we talk about AGI and ASI so much here on Less Wrong dot com is because those AI systems could lead to drastic changes of the future of the universe. Otherwise we wouldn’t really be interested in them, and go back to arguing about anthropics or whatever.
Whether some system is “real” AGI based on its internals is not relevant to this question. (The internals of AI systems are of course interesting in themselves, and for many other reasons.)
(As such, I read that paragraph by gwern to be sarcastic, and mocking people who insist that it’s “not really AGI” if it doesn’t function in the way they believe it should work.)
I think if the lightcone looks the same, it should, if it doesn’t, our policies should look different. It would matter if the resulting AIs fall over and leave the lightcone in the primordial state, which looks plausible from your view?
For the same reason we make a distinction between the Taylor-polynomial approximation of a function, and that function itself? There is a “correct” algorithm for general intelligence, which e. g. humans use, and there are various ways to approximate it. Approximants would behave differently from the “real thing” in various ways, and it’s important to track which one you’re looking at.
That’s not the primary difference, in my view. My view is simply that there’s a difference between an “exact AGI” and AGI-approximants at all. That difference can be characterized as “whether various agent-foundations results apply to that system universally / predict its behavior universally, or only in some limited contexts”. Is it well-modeld as an agent, or is modeling it as an agent itself a drastically flawed approximation of its behavior that breaks down in some contexts?
And yes, “falls over after the omnicide” is a plausible way for AGI-approximants to differ from an actual AGI. Say, their behavior can be coherent at timescales long enough to plot humanity’s demise (e. g., a millennium), but become incoherent at 100x that time-scale. So after a million years, their behavior devolves into noise, and they self-destruct or just enter a stasis. More likely, I would expect that if a civilization of those things ever runs into an actual general superintelligence (e. g., an alien one), the “real deal” would rip through them like it would through humanity (even if they have a massive resource advantage, e. g. 1000x more galaxies under their control).
Some other possible differences:
Capability level as a function of compute. Approximants can be drastically less compute-efficient, in a quantity-is-a-quality-of-its-own way.
Generality. Approximants can be much “spikier” in their capabilities than general intelligences, genius-level in some domains/at some problems and below-toddler in other cases, in a way that is random, or fully determined by their training data.
Approximants’ development may lack a sharp left turn.
Autonomy. Approximants can be incapable of autonomously extending their agency horizons, or generalizing to new domains, or deriving new domains (“innovating”) without an “actual” general intelligence to guide them.
(Which doesn’t necessarily prevent them from destroying humanity. Do you need to “innovate” to do so? I don’t know that you do.)
None of this may matter for the question of “can this thing kill humanity?”. But, like, I think there’s a real difference there that cuts reality at the joints, and tracking it is important.
In particular, it’s important for the purposes of evaluating whether we in fact do live in a world where a specific AGI-approximant scales to omnicide given realistic quantities of compute, or whether we don’t.
For example, consider @Daniel Kokotajlo’s expectation that METR’s task horizons would go superexponential at some point. I think it’s quite likely if the LLM paradigm scales to an actual AGI, if it is really acquiring the same generality/agency skills humans have. If not, however, if it is merely some domain-limited approximation of AGI, it may stay at the current trend forever (or, well, as long as the inputs keep scaling the same way), no matter how counter-intuitive that may feel.
Indeed, “how much can we rely on our intuitions regarding agents/humans when making predictions about a given AI paradigm?” may be a good way to characterize the difference here. Porting intuitions about agents lets us make predictions like “superexponential task-horizon scaling” and “the sharp left turn” and “obviously the X amount of compute and data ought to be enough for superintelligence”. But if the underlying system is not a “real” agent in many kinds of ways, those predictions would start becoming suspect. (And I would say the current paradigm’s trajectory has been quite counter-intuitive, from this perspective. My intuitions broke around GPT-3.5, and my current models are an attempt to deal with that.)
This, again, may not actually bear on the question of whether the given system scales to omnicide. The AGI approximant’s scaling laws may still be powerful enough. But it determines what hypotheses we should be tracking about this, what observations we should be looking for, and how we should update on them.
Things worth talking about are the things that can lead to drastic changes of the future of the universe, yes. And AGI is one of those things, so we should talk about it. But I think defining “is an AGI” as “any system that can lead to drastic changes of the future of the universe” is silly, no? Words mean things. A sufficiently big antimatter bomb is not an AGI, superviruses are not AGIs, and LLMs can be omnicide-capable (and therefore worth talking about) without being AGIs as well.
For AGIs and agents, many approximations are interchangeable with the real thing, because they are capable of creating the real thing in the world as a separate construct, or converging to it in behavior. Human decisions for example are noisy and imprecise, but for mathematical or engineering questions it’s possible to converge on arbitrary certainty and precision. In a similar way, humans are approximations to superintelligence, even though not themselves superintelligence.
Thus many AGI-approximations may be capable of becoming or creating the real thing eventually. Not everything converges, but the distinction should be about that, not about already being there.
Right, this helps. I guess I don’t want to fight about definitions here. I’d just say “ah, any software that you can run on computers that can cause the extinction of humanity even if humans try to prevent it” would fulfill the sufficiency criterion for AGIniplav, and then there’s different classes of algorithms/learners/architectures that fulfill that criterion, and have different properties.
(I wouldn’t even say that “can omnicide us” is necessary for AGIniplav membership—”my AGI timelines are −3 years”30%.)
One crux here may be that you are more certain that “AGI” is a thing? My intuition goes more in the direction of “there’s tons of different cognitive algorithms, with different properties, among the computable ones they’re on a high-dimensional set of spectra, some of which in aggregate may be called ‘generality’.”
I think no free lunch theorems point at this, as well as the conclusions from this post. Solomonoff inductors’ beliefs look like they’d look messy and noisy, and current neural networks look messy and noisy too. I personally would find it more beautiful and nice if Thinking was a Thing, but I’ve received more evidence I interpret as “it’s actually not”.
But my questions have been answered to the degree I wanted them answered, thanks :-)
A flight control program directing an asteroid redirection rocket, programmed to find a large asteroid and steer it to crash into Earth seems like the sort of thing which could be “software that you can run on computers that can cause the extinction of humanity” but not “AGI”.
I think it’s relevant that “kill all humans” is a much easier target than “kill all humans in such a way that you can persist and grow indefinitely without them”.
Yes, and this might be a crux between “successionists” and “doomers” with highly cosmopolitan values.
Fair.
Yup.