It’s possible that we should already be thinking of GPT-4 as “AGI” on some definitions, so to be clear about the threshold of generality I have in mind, I’ll specifically talk about “STEM-level AGI”, though I expect such systems to be good at non-STEM tasks too.
This motte/bailey shows up in almost every argument for AI doom. People point out that current AI systems are obviously safe, and the response is “not THOSE AI systems, the future much more dangerous ones.” I don’t find this kind of speculation helpful or informative.
Human bodies and the food, water, air, sunlight, etc. we need to live are resources (“you are made of atoms the AI can use for something else”); and we’re also potential threats (e.g., we could build a rival superintelligent AI that executes a totally different plan).
This is an incredibly generic claim to the point of being meaningless. There is a reason why most human plans don’t start with “wipe out all of my potential competitors”.
Current ML work is on track to produce things that are, in the ways that matter, more like “randomly sampled plans” than like “the sorts of plans a civilization of human von Neumanns would produce”. (Before we’re anywhere near being able to produce the latter sorts of things.)
If by current ML you mean GPT, then it is LITERALLY trained to imitate humans (by predicting the next token). If by current ML you mean something else, say what you mean. I don’t think anyone is building an AI that randomly samples plans, as that would be exceedingly inefficient.
The key differences between humans and “things that are more easily approximated as random search processes than as humans-plus-a-bit-of-noise” lies in lots of complicated machinery in the human brain.
See the above objection. But also, just because humans are complex biological things doesn’t mean approximating our behaviors is similarly complex.
STEM-level AGI timelines don’t look that long (e.g., probably not 50 or 150 years; could well be 5 years or 15).
Sure. But I don’t think there’s some magically “level” where AI suddenly goes Foom. STEM level AI will seem somewhat unimpressive (“hasn’t AI been able to do that for years?”) when it finally arrives. AI is already writing code, designing experiments, etc. There’s nothing special about STEM level as you’ve defined it. BabyAGI could already be “STEM level” and it wouldn’t change the fact that it’s currently not remotely a threat to humanity.
We don’t currently know how to do alignment, we don’t seem to have a much better idea now than we did 10 years ago, and there are many large novel visible difficulties.
We should be starting with a pessimistic prior about achieving reliably good behavior in any complex safety-critical software, particularly if the software is novel.
You can start with whatever prior you want. The point of being a good Bayesian is updating your prior when you get new information. What information about current ML state of the art caused you to update this way?
Neither ML nor the larger world is currently taking this seriously, as of April 2023.
The people taking this “seriously” don’t seem to be doing a better job than the rest of us.
As noted above, current ML is very opaque, and it mostly lets you intervene on behavioral proxies for what we want, rather than letting us directly design desirable features.
Again, I’m begging you to actually learn something about the current state of the art instead of making claims!
There are lots of specific abilities which seem like they ought to be possible for the kind of civilization that can safely deploy smarter-than-human optimization, that are far out of reach, with no obvious path forward for achieving them with opaque deep nets even if we had unlimited time to work on some relatively concrete set of research directions.
C’mon! You’re telling me that you have a specific list of capabilities that you know a “surviving civilization” would be developing and you chose to write this post instead of a detailed explanation of each and every one of those capabilities?
It’s possible that we should already be thinking of GPT-4 as “AGI” on some definitions, so to be clear about the threshold of generality I have in mind, I’ll specifically talk about “STEM-level AGI”, though I expect such systems to be good at non-STEM tasks too.
This motte/bailey shows up in almost every argument for AI doom. People point out that current AI systems are obviously safe, and the response is “not THOSE AI systems, the future much more dangerous ones.” I don’t find this kind of speculation helpful or informative.
This doesn’t seem true to me, at least if we’re restricting it to those arguments made by people who’ve actually thought about the issue more than the average Twitter poster. The concern has always been with AGI systems which are superhuman across a broad range of capabilities, specifically including those which are necessary for achieving difficult goals over long time horizons.
Human bodies and the food, water, air, sunlight, etc. we need to live are resources (“you are made of atoms the AI can use for something else”); and we’re also potential threats (e.g., we could build a rival superintelligent AI that executes a totally different plan).
This is an incredibly generic claim to the point of being meaningless. There is a reason why most human plans don’t start with “wipe out all of my potential competitors”.
From Rob’s first footnote:
Also, I wrote this post to summarize my own top reasons for being worried, not to try to make a maximally compelling or digestible case for others. I don’t expect others to be similarly confident based on such a quick overview, unless perhaps you’ve read other sources on AI risk in the past.
But aside from that, how is the claim generic or meaningless? To the contrary, it seems quite specific to me—it’s pointing out specific reasons why a plan that wasn’t very narrowly optimized for human survival & flourishing would, by default, not include those things in the world it produced.
Current ML work is on track to produce things that are, in the ways that matter, more like “randomly sampled plans” than like “the sorts of plans a civilization of human von Neumanns would produce”. (Before we’re anywhere near being able to produce the latter sorts of things.)
If by current ML you mean GPT, then it is LITERALLY trained to imitate humans (by predicting the next token). If by current ML you mean something else, say what you mean. I don’t think anyone is building an AI that randomly samples plans, as that would be exceedingly inefficient.
The task of predicting the next token does not seem like it would lead to cognition that resembles “very smart humans thinking smart-human-shaped thoughts”.
The key differences between humans and “things that are more easily approximated as random search processes than as humans-plus-a-bit-of-noise” lies in lots of complicated machinery in the human brain.
See the above objection. But also, just because humans are complex biological things doesn’t mean approximating our behaviors is similarly complex.
This doesn’t seem to engage at all with the actual details laid out for that point.
We should be starting with a pessimistic prior about achieving reliably good behavior in any complex safety-critical software, particularly if the software is novel.
You can start with whatever prior you want. The point of being a good Bayesian is updating your prior when you get new information. What information about current ML state of the art caused you to update this way?
I don’t see any update being described in the text you quote, but of course there’s relevant detail described later in that point, which, idk, if you’d read it, I sort of assume you’d have mentioned it?
Neither ML nor the larger world is currently taking this seriously, as of April 2023.
The people taking this “seriously” don’t seem to be doing a better job than the rest of us.
Is this supposed to be an argument?
As noted above, current ML is very opaque, and it mostly lets you intervene on behavioral proxies for what we want, rather than letting us directly design desirable features.
Again, I’m begging you to actually learn something about the current state of the art instead of making claims!
I read your post and it does not describe a way for us to “directly design desirable features” in our current ML paradigm. I think “current ML is very opaque” is a very accurate summary of our understanding of how current ML systems perform complicated cognitive tasks. (We’ve gotten about as far as figuring out how a toy network performs modular addition.)
I read your post and it does not describe a way for us to “directly design desirable features” in our current ML paradigm. I think “current ML is very opaque” is a very accurate summary of our understanding of how current ML systems perform complicated cognitive tasks. (We’ve gotten about as far as figuring out how a toy network performs modular addition.)
How familiar are you with loras, textual inversion, latent space translations and the like? Because these are all techniques invented within the last year that allow us to directly add (or subtract) features from neural networks in a way that is very easy and natural for humans to work with. Want to teach your AI what “modern Disney style” animation looks like? Sounds like a horribly abstract and complicated concept, but we can now explain to an AI what it means in a process that takes <1hr, a few megabytes of storage, and an can be reused across a wide variety of neural networks. This paper in particular is fantastic because it allows you to define “beauty” in terms of “I don’t know what it is, but I know it when I see it” and turn it into a concrete representation.
That does indeed seem like some progress, though note that it does not really let us answer questions like “what algorithm is this NN performing that lets it do whatever it’s doing”, to a degree of understanding sufficient to implement that algorithm directly (or even a simpler, approximated version, which is still meaningfully better than what the previous state-of-the-art was, if restricted to “hand-written code” rather than an ML model).
I think that to the extent we need to answer “what algorithm?” style questions, we will do it with techniques like this one where we just have the AI write code.
But I don’t think “what algorithm?” is a meaningful question to ask regarding “Modern Disney Style”, the question is too abstract to have a clean-cut definition in terms of human-readable code. It’s sufficient that we can define and use it given a handful of exemplars in a way that intuitively agrees with humans perception of what those words should mean.
Lots I disagree with, let’s go point by point.
This motte/bailey shows up in almost every argument for AI doom. People point out that current AI systems are obviously safe, and the response is “not THOSE AI systems, the future much more dangerous ones.” I don’t find this kind of speculation helpful or informative.
This is an incredibly generic claim to the point of being meaningless. There is a reason why most human plans don’t start with “wipe out all of my potential competitors”.
If by current ML you mean GPT, then it is LITERALLY trained to imitate humans (by predicting the next token). If by current ML you mean something else, say what you mean. I don’t think anyone is building an AI that randomly samples plans, as that would be exceedingly inefficient.
See the above objection. But also, just because humans are complex biological things doesn’t mean approximating our behaviors is similarly complex.
Sure. But I don’t think there’s some magically “level” where AI suddenly goes Foom. STEM level AI will seem somewhat unimpressive (“hasn’t AI been able to do that for years?”) when it finally arrives. AI is already writing code, designing experiments, etc. There’s nothing special about STEM level as you’ve defined it. BabyAGI could already be “STEM level” and it wouldn’t change the fact that it’s currently not remotely a threat to humanity.
I literally just wrote an essay about this.
You can start with whatever prior you want. The point of being a good Bayesian is updating your prior when you get new information. What information about current ML state of the art caused you to update this way?
The people taking this “seriously” don’t seem to be doing a better job than the rest of us.
Again, I’m begging you to actually learn something about the current state of the art instead of making claims!
C’mon! You’re telling me that you have a specific list of capabilities that you know a “surviving civilization” would be developing and you chose to write this post instead of a detailed explanation of each and every one of those capabilities?
This doesn’t seem true to me, at least if we’re restricting it to those arguments made by people who’ve actually thought about the issue more than the average Twitter poster. The concern has always been with AGI systems which are superhuman across a broad range of capabilities, specifically including those which are necessary for achieving difficult goals over long time horizons.
From Rob’s first footnote:
But aside from that, how is the claim generic or meaningless? To the contrary, it seems quite specific to me—it’s pointing out specific reasons why a plan that wasn’t very narrowly optimized for human survival & flourishing would, by default, not include those things in the world it produced.
The task of predicting the next token does not seem like it would lead to cognition that resembles “very smart humans thinking smart-human-shaped thoughts”.
This doesn’t seem to engage at all with the actual details laid out for that point.
I don’t see any update being described in the text you quote, but of course there’s relevant detail described later in that point, which, idk, if you’d read it, I sort of assume you’d have mentioned it?
Is this supposed to be an argument?
I read your post and it does not describe a way for us to “directly design desirable features” in our current ML paradigm. I think “current ML is very opaque” is a very accurate summary of our understanding of how current ML systems perform complicated cognitive tasks. (We’ve gotten about as far as figuring out how a toy network performs modular addition.)
How familiar are you with loras, textual inversion, latent space translations and the like? Because these are all techniques invented within the last year that allow us to directly add (or subtract) features from neural networks in a way that is very easy and natural for humans to work with. Want to teach your AI what “modern Disney style” animation looks like? Sounds like a horribly abstract and complicated concept, but we can now explain to an AI what it means in a process that takes <1hr, a few megabytes of storage, and an can be reused across a wide variety of neural networks. This paper in particular is fantastic because it allows you to define “beauty” in terms of “I don’t know what it is, but I know it when I see it” and turn it into a concrete representation.
That does indeed seem like some progress, though note that it does not really let us answer questions like “what algorithm is this NN performing that lets it do whatever it’s doing”, to a degree of understanding sufficient to implement that algorithm directly (or even a simpler, approximated version, which is still meaningfully better than what the previous state-of-the-art was, if restricted to “hand-written code” rather than an ML model).
I think that to the extent we need to answer “what algorithm?” style questions, we will do it with techniques like this one where we just have the AI write code.
But I don’t think “what algorithm?” is a meaningful question to ask regarding “Modern Disney Style”, the question is too abstract to have a clean-cut definition in terms of human-readable code. It’s sufficient that we can define and use it given a handful of exemplars in a way that intuitively agrees with humans perception of what those words should mean.