Why do you define “AGI” through its internals, and not through its capabilities?
For the same reason we make a distinction between the Taylor-polynomial approximation of a function, and that function itself? There is a “correct” algorithm for general intelligence, which e. g. humans use, and there are various ways to approximate it. Approximants would behave differently from the “real thing” in various ways, and it’s important to track which one you’re looking at.
It would matter if the resulting AIs fall over and leave the lightcone in the primordial state, which looks plausible from your view?
That’s not the primary difference, in my view. My view is simply that there’s a difference between an “exact AGI” and AGI-approximants at all. That difference can be characterized as “whether various agent-foundations results apply to that system universally / predict its behavior universally, or only in some limited contexts”. Is it well-modeld as an agent, or is modeling it as an agent itself a drastically flawed approximation of its behavior that breaks down in some contexts?
And yes, “falls over after the omnicide” is a plausible way for AGI-approximants to differ from an actual AGI. Say, their behavior can be coherent at timescales long enough to plot humanity’s demise (e. g., a millennium), but become incoherent at 100x that time-scale. So after a million years, their behavior devolves into noise, and they self-destruct or just enter a stasis. More likely, I would expect that if a civilization of those things ever runs into an actual general superintelligence (e. g., an alien one), the “real deal” would rip through them like it would through humanity (even if they have a massive resource advantage, e. g. 1000x more galaxies under their control).
Some other possible differences:
Capability level as a function of compute. Approximants can be drastically less compute-efficient, in a quantity-is-a-quality-of-its-own way.
Generality. Approximants can be much “spikier” in their capabilities than general intelligences, genius-level in some domains/at some problems and below-toddler in other cases, in a way that is random, or fully determined by their training data.
Autonomy. Approximants can be incapable of autonomously extending their agency horizons, or generalizing to new domains, or deriving new domains (“innovating”) without an “actual” general intelligence to guide them.
(Which doesn’t necessarily prevent them from destroying humanity. Do you need to “innovate” to do so? I don’t know that you do.)
None of this may matter for the question of “can this thing kill humanity?”. But, like, I think there’s a real difference there that cuts reality at the joints, and tracking it is important.
In particular, it’s important for the purposes of evaluating whether we in fact do live in a world where a specific AGI-approximant scales to omnicide given realistic quantities of compute, or whether we don’t.
For example, consider @Daniel Kokotajlo’s expectation that METR’s task horizons would go superexponential at some point. I think it’s quite likely if the LLM paradigm scales to an actual AGI, if it is really acquiring the same generality/agency skills humans have. If not, however, if it is merely some domain-limited approximation of AGI, it may stay at the current trend forever (or, well, as long as the inputs keep scaling the same way), no matter how counter-intuitive that may feel.
Indeed, “how much can we rely on our intuitions regarding agents/humans when making predictions about a given AI paradigm?” may be a good way to characterize the difference here. Porting intuitions about agents lets us make predictions like “superexponential task-horizon scaling” and “the sharp left turn” and “obviously the X amount of compute and data ought to be enough for superintelligence”. But if the underlying system is not a “real” agent in many kinds of ways, those predictions would start becoming suspect. (And I would say the current paradigm’s trajectory has been quite counter-intuitive, from this perspective. My intuitions broke around GPT-3.5, and my current models are an attempt to deal with that.)
This, again, may not actually bear on the question of whether the given system scales to omnicide. The AGI approximant’s scaling laws may still be powerful enough. But it determines what hypotheses we should be tracking about this, what observations we should be looking for, and how we should update on them.
And the whole reason why we talk about AGI and ASI so much here on Less Wrong dot com is because those AI systems could lead to drastic changes of the future of the universe.
Things worth talking about are the things that can lead to drastic changes of the future of the universe, yes. And AGI is one of those things, so we should talk about it. But I think defining “is an AGI” as “any system that can lead to drastic changes of the future of the universe” is silly, no? Words mean things. A sufficiently big antimatter bomb is not an AGI, superviruses are not AGIs, and LLMs can be omnicide-capable (and therefore worth talking about) without being AGIs as well.
For AGIs and agents, many approximations are interchangeable with the real thing, because they are capable of creating the real thing in the world as a separate construct, or converging to it in behavior. Human decisions for example are noisy and imprecise, but for mathematical or engineering questions it’s possible to converge on arbitrary certainty and precision. In a similar way, humans are approximations to superintelligence, even though not themselves superintelligence.
Thus many AGI-approximations may be capable of becoming or creating the real thing eventually. Not everything converges, but the distinction should be about that, not about already being there.
Right, this helps. I guess I don’t want to fight about definitions here. I’d just say “ah, any software that you can run on computers that can cause the extinction of humanity even if humans try to prevent it” would fulfill the sufficiency criterion for AGIniplav, and then there’s different classes of algorithms/learners/architectures that fulfill that criterion, and have different properties.
(I wouldn’t even say that “can omnicide us” is necessary for AGIniplav membership—”my AGI timelines are −3 years”30%.)
One crux here may be that you are more certain that “AGI” is a thing? My intuition goes more in the direction of “there’s tons of different cognitive algorithms, with different properties, among the computable ones they’re on a high-dimensional set of spectra, some of which in aggregate may be called ‘generality’.”
I think no free lunch theorems point at this, as well as the conclusions from this post. Solomonoff inductors’ beliefs look like they’d look messy and noisy, and current neural networks look messy and noisy too. I personally would find it more beautiful and nice if Thinking was a Thing, but I’ve received more evidence I interpret as “it’s actually not”.
But my questions have been answered to the degree I wanted them answered, thanks :-)
ah, any software that you can run on computers that can cause the extinction of humanity even if humans try to prevent it would fulfill the sufficiency criterion for AGIniplav
A flight control program directing an asteroid redirection rocket, programmed to find a large asteroid and steer it to crash into Earth seems like the sort of thing which could be “software that you can run on computers that can cause the extinction of humanity” but not “AGI”.
I think it’s relevant that “kill all humans” is a much easier target than “kill all humans in such a way that you can persist and grow indefinitely without them”.
I’d just say “ah, any software that you can run on computers that can cause the extinction of humanity even if humans try to prevent it” would fulfill the sufficiency criterion for AGIniplav
Fair.
One crux here may be that you are more certain that “AGI” is a thing?
For the same reason we make a distinction between the Taylor-polynomial approximation of a function, and that function itself? There is a “correct” algorithm for general intelligence, which e. g. humans use, and there are various ways to approximate it. Approximants would behave differently from the “real thing” in various ways, and it’s important to track which one you’re looking at.
That’s not the primary difference, in my view. My view is simply that there’s a difference between an “exact AGI” and AGI-approximants at all. That difference can be characterized as “whether various agent-foundations results apply to that system universally / predict its behavior universally, or only in some limited contexts”. Is it well-modeld as an agent, or is modeling it as an agent itself a drastically flawed approximation of its behavior that breaks down in some contexts?
And yes, “falls over after the omnicide” is a plausible way for AGI-approximants to differ from an actual AGI. Say, their behavior can be coherent at timescales long enough to plot humanity’s demise (e. g., a millennium), but become incoherent at 100x that time-scale. So after a million years, their behavior devolves into noise, and they self-destruct or just enter a stasis. More likely, I would expect that if a civilization of those things ever runs into an actual general superintelligence (e. g., an alien one), the “real deal” would rip through them like it would through humanity (even if they have a massive resource advantage, e. g. 1000x more galaxies under their control).
Some other possible differences:
Capability level as a function of compute. Approximants can be drastically less compute-efficient, in a quantity-is-a-quality-of-its-own way.
Generality. Approximants can be much “spikier” in their capabilities than general intelligences, genius-level in some domains/at some problems and below-toddler in other cases, in a way that is random, or fully determined by their training data.
Approximants’ development may lack a sharp left turn.
Autonomy. Approximants can be incapable of autonomously extending their agency horizons, or generalizing to new domains, or deriving new domains (“innovating”) without an “actual” general intelligence to guide them.
(Which doesn’t necessarily prevent them from destroying humanity. Do you need to “innovate” to do so? I don’t know that you do.)
None of this may matter for the question of “can this thing kill humanity?”. But, like, I think there’s a real difference there that cuts reality at the joints, and tracking it is important.
In particular, it’s important for the purposes of evaluating whether we in fact do live in a world where a specific AGI-approximant scales to omnicide given realistic quantities of compute, or whether we don’t.
For example, consider @Daniel Kokotajlo’s expectation that METR’s task horizons would go superexponential at some point. I think it’s quite likely if the LLM paradigm scales to an actual AGI, if it is really acquiring the same generality/agency skills humans have. If not, however, if it is merely some domain-limited approximation of AGI, it may stay at the current trend forever (or, well, as long as the inputs keep scaling the same way), no matter how counter-intuitive that may feel.
Indeed, “how much can we rely on our intuitions regarding agents/humans when making predictions about a given AI paradigm?” may be a good way to characterize the difference here. Porting intuitions about agents lets us make predictions like “superexponential task-horizon scaling” and “the sharp left turn” and “obviously the X amount of compute and data ought to be enough for superintelligence”. But if the underlying system is not a “real” agent in many kinds of ways, those predictions would start becoming suspect. (And I would say the current paradigm’s trajectory has been quite counter-intuitive, from this perspective. My intuitions broke around GPT-3.5, and my current models are an attempt to deal with that.)
This, again, may not actually bear on the question of whether the given system scales to omnicide. The AGI approximant’s scaling laws may still be powerful enough. But it determines what hypotheses we should be tracking about this, what observations we should be looking for, and how we should update on them.
Things worth talking about are the things that can lead to drastic changes of the future of the universe, yes. And AGI is one of those things, so we should talk about it. But I think defining “is an AGI” as “any system that can lead to drastic changes of the future of the universe” is silly, no? Words mean things. A sufficiently big antimatter bomb is not an AGI, superviruses are not AGIs, and LLMs can be omnicide-capable (and therefore worth talking about) without being AGIs as well.
For AGIs and agents, many approximations are interchangeable with the real thing, because they are capable of creating the real thing in the world as a separate construct, or converging to it in behavior. Human decisions for example are noisy and imprecise, but for mathematical or engineering questions it’s possible to converge on arbitrary certainty and precision. In a similar way, humans are approximations to superintelligence, even though not themselves superintelligence.
Thus many AGI-approximations may be capable of becoming or creating the real thing eventually. Not everything converges, but the distinction should be about that, not about already being there.
Right, this helps. I guess I don’t want to fight about definitions here. I’d just say “ah, any software that you can run on computers that can cause the extinction of humanity even if humans try to prevent it” would fulfill the sufficiency criterion for AGIniplav, and then there’s different classes of algorithms/learners/architectures that fulfill that criterion, and have different properties.
(I wouldn’t even say that “can omnicide us” is necessary for AGIniplav membership—”my AGI timelines are −3 years”30%.)
One crux here may be that you are more certain that “AGI” is a thing? My intuition goes more in the direction of “there’s tons of different cognitive algorithms, with different properties, among the computable ones they’re on a high-dimensional set of spectra, some of which in aggregate may be called ‘generality’.”
I think no free lunch theorems point at this, as well as the conclusions from this post. Solomonoff inductors’ beliefs look like they’d look messy and noisy, and current neural networks look messy and noisy too. I personally would find it more beautiful and nice if Thinking was a Thing, but I’ve received more evidence I interpret as “it’s actually not”.
But my questions have been answered to the degree I wanted them answered, thanks :-)
A flight control program directing an asteroid redirection rocket, programmed to find a large asteroid and steer it to crash into Earth seems like the sort of thing which could be “software that you can run on computers that can cause the extinction of humanity” but not “AGI”.
I think it’s relevant that “kill all humans” is a much easier target than “kill all humans in such a way that you can persist and grow indefinitely without them”.
Yes, and this might be a crux between “successionists” and “doomers” with highly cosmopolitan values.
Fair.
Yup.