It’s possible that we should already be thinking of GPT-4 as “AGI” on some definitions, so to be clear about the threshold of generality I have in mind, I’ll specifically talk about “STEM-level AGI”, though I expect such systems to be good at non-STEM tasks too.
This motte/bailey shows up in almost every argument for AI doom. People point out that current AI systems are obviously safe, and the response is “not THOSE AI systems, the future much more dangerous ones.” I don’t find this kind of speculation helpful or informative.
This doesn’t seem true to me, at least if we’re restricting it to those arguments made by people who’ve actually thought about the issue more than the average Twitter poster. The concern has always been with AGI systems which are superhuman across a broad range of capabilities, specifically including those which are necessary for achieving difficult goals over long time horizons.
Human bodies and the food, water, air, sunlight, etc. we need to live are resources (“you are made of atoms the AI can use for something else”); and we’re also potential threats (e.g., we could build a rival superintelligent AI that executes a totally different plan).
This is an incredibly generic claim to the point of being meaningless. There is a reason why most human plans don’t start with “wipe out all of my potential competitors”.
From Rob’s first footnote:
Also, I wrote this post to summarize my own top reasons for being worried, not to try to make a maximally compelling or digestible case for others. I don’t expect others to be similarly confident based on such a quick overview, unless perhaps you’ve read other sources on AI risk in the past.
But aside from that, how is the claim generic or meaningless? To the contrary, it seems quite specific to me—it’s pointing out specific reasons why a plan that wasn’t very narrowly optimized for human survival & flourishing would, by default, not include those things in the world it produced.
Current ML work is on track to produce things that are, in the ways that matter, more like “randomly sampled plans” than like “the sorts of plans a civilization of human von Neumanns would produce”. (Before we’re anywhere near being able to produce the latter sorts of things.)
If by current ML you mean GPT, then it is LITERALLY trained to imitate humans (by predicting the next token). If by current ML you mean something else, say what you mean. I don’t think anyone is building an AI that randomly samples plans, as that would be exceedingly inefficient.
The task of predicting the next token does not seem like it would lead to cognition that resembles “very smart humans thinking smart-human-shaped thoughts”.
The key differences between humans and “things that are more easily approximated as random search processes than as humans-plus-a-bit-of-noise” lies in lots of complicated machinery in the human brain.
See the above objection. But also, just because humans are complex biological things doesn’t mean approximating our behaviors is similarly complex.
This doesn’t seem to engage at all with the actual details laid out for that point.
We should be starting with a pessimistic prior about achieving reliably good behavior in any complex safety-critical software, particularly if the software is novel.
You can start with whatever prior you want. The point of being a good Bayesian is updating your prior when you get new information. What information about current ML state of the art caused you to update this way?
I don’t see any update being described in the text you quote, but of course there’s relevant detail described later in that point, which, idk, if you’d read it, I sort of assume you’d have mentioned it?
Neither ML nor the larger world is currently taking this seriously, as of April 2023.
The people taking this “seriously” don’t seem to be doing a better job than the rest of us.
Is this supposed to be an argument?
As noted above, current ML is very opaque, and it mostly lets you intervene on behavioral proxies for what we want, rather than letting us directly design desirable features.
Again, I’m begging you to actually learn something about the current state of the art instead of making claims!
I read your post and it does not describe a way for us to “directly design desirable features” in our current ML paradigm. I think “current ML is very opaque” is a very accurate summary of our understanding of how current ML systems perform complicated cognitive tasks. (We’ve gotten about as far as figuring out how a toy network performs modular addition.)
I read your post and it does not describe a way for us to “directly design desirable features” in our current ML paradigm. I think “current ML is very opaque” is a very accurate summary of our understanding of how current ML systems perform complicated cognitive tasks. (We’ve gotten about as far as figuring out how a toy network performs modular addition.)
How familiar are you with loras, textual inversion, latent space translations and the like? Because these are all techniques invented within the last year that allow us to directly add (or subtract) features from neural networks in a way that is very easy and natural for humans to work with. Want to teach your AI what “modern Disney style” animation looks like? Sounds like a horribly abstract and complicated concept, but we can now explain to an AI what it means in a process that takes <1hr, a few megabytes of storage, and an can be reused across a wide variety of neural networks. This paper in particular is fantastic because it allows you to define “beauty” in terms of “I don’t know what it is, but I know it when I see it” and turn it into a concrete representation.
That does indeed seem like some progress, though note that it does not really let us answer questions like “what algorithm is this NN performing that lets it do whatever it’s doing”, to a degree of understanding sufficient to implement that algorithm directly (or even a simpler, approximated version, which is still meaningfully better than what the previous state-of-the-art was, if restricted to “hand-written code” rather than an ML model).
I think that to the extent we need to answer “what algorithm?” style questions, we will do it with techniques like this one where we just have the AI write code.
But I don’t think “what algorithm?” is a meaningful question to ask regarding “Modern Disney Style”, the question is too abstract to have a clean-cut definition in terms of human-readable code. It’s sufficient that we can define and use it given a handful of exemplars in a way that intuitively agrees with humans perception of what those words should mean.
This doesn’t seem true to me, at least if we’re restricting it to those arguments made by people who’ve actually thought about the issue more than the average Twitter poster. The concern has always been with AGI systems which are superhuman across a broad range of capabilities, specifically including those which are necessary for achieving difficult goals over long time horizons.
From Rob’s first footnote:
But aside from that, how is the claim generic or meaningless? To the contrary, it seems quite specific to me—it’s pointing out specific reasons why a plan that wasn’t very narrowly optimized for human survival & flourishing would, by default, not include those things in the world it produced.
The task of predicting the next token does not seem like it would lead to cognition that resembles “very smart humans thinking smart-human-shaped thoughts”.
This doesn’t seem to engage at all with the actual details laid out for that point.
I don’t see any update being described in the text you quote, but of course there’s relevant detail described later in that point, which, idk, if you’d read it, I sort of assume you’d have mentioned it?
Is this supposed to be an argument?
I read your post and it does not describe a way for us to “directly design desirable features” in our current ML paradigm. I think “current ML is very opaque” is a very accurate summary of our understanding of how current ML systems perform complicated cognitive tasks. (We’ve gotten about as far as figuring out how a toy network performs modular addition.)
How familiar are you with loras, textual inversion, latent space translations and the like? Because these are all techniques invented within the last year that allow us to directly add (or subtract) features from neural networks in a way that is very easy and natural for humans to work with. Want to teach your AI what “modern Disney style” animation looks like? Sounds like a horribly abstract and complicated concept, but we can now explain to an AI what it means in a process that takes <1hr, a few megabytes of storage, and an can be reused across a wide variety of neural networks. This paper in particular is fantastic because it allows you to define “beauty” in terms of “I don’t know what it is, but I know it when I see it” and turn it into a concrete representation.
That does indeed seem like some progress, though note that it does not really let us answer questions like “what algorithm is this NN performing that lets it do whatever it’s doing”, to a degree of understanding sufficient to implement that algorithm directly (or even a simpler, approximated version, which is still meaningfully better than what the previous state-of-the-art was, if restricted to “hand-written code” rather than an ML model).
I think that to the extent we need to answer “what algorithm?” style questions, we will do it with techniques like this one where we just have the AI write code.
But I don’t think “what algorithm?” is a meaningful question to ask regarding “Modern Disney Style”, the question is too abstract to have a clean-cut definition in terms of human-readable code. It’s sufficient that we can define and use it given a handful of exemplars in a way that intuitively agrees with humans perception of what those words should mean.