Humans have all the resources, they don’t need internet, computers, or electricity to live or wage war, and are willing to resort to extremely drastic measures when facing a serious threat.
Current human society definitely relies in substantial part on all of the above to function. I agree that we wouldn’t all die if we lost electricity tomorrow (for an extended period of time), but losing a double-digit % of the population seems plausible.
Also, observably, we, as a society, do not resort to sensible measures when dealing with a serious thread (e.g. covid).
It’s true that an AI could correct it’s own flaws using experimentation. This cannot lead to perfection, however, because the process of correcting itself is also necessarily imperfect.
This doesn’t seem relevant. It doesn’t need to be perfect, merely better than us along certain axes, and we have existence proof that such improvement is possible.
For these reasons, I expect AGI to be flawed, and especially flawed when doing things it was not originally meant to do, like conquer the entire planet.
Sure, maybe we get very lucky and land in the (probably extremely narrow) strike zone between “smart enough to meaningfully want things and try to optimize for them” and “dumb enough to not realize it won’t succeed at takeover at its current level of capabilities”. It’s actually not at all obvious to me that such a strike zone even exists if you’re building on top of current LLMs, since those come pre-packaged with genre savviness, but maybe.
I believe that all plans for world domination will involve incomputable steps. In my post I use Yudkowsky’s “mix proteins in a beaker” scenario, where I think the modelling of the proteins are unlikely to be accurate enough to produce a nano-factory without extensive amounts of trial and error experimentation.
If such experimentation were required, it means the timeline for takeover is much longer, that significant mistakes by the AI are possible (due to bad luck), and that takeover plans might be detectable. All of this greatly decreases the likelihood of AI domination, especially if we are actively monitoring for it.
This is doing approximately all of the work in this section, I think.
There indeed don’t seem to be obvious-to-human-level-intelligence world domination plans that are very likely to succeed.
It would be quite surprising if physics ruled out world domination from our current starting point.
I don’t think anybody is hung up on “the AI can one-shot predict a successful plan that doesn’t require any experimentation or course correction” as a pre-requisite for doom, or even comprise a substantial chunk of their doom %.
Assuming that the AI will make significant mistakes that are noticeable by humans as signs of impending takeover is simply importing in the assumption of hitting some very specific (and possibly non-existent) zone of capabilities.
Ok, so it takes a few extra months. How does this buy us much? The active monitoring you want to rely on currently doesn’t exist, and progress on advancing mechanistic interpretability certainly seems to be going slower than progress on advancing capabilities (i.e. we’re getting further away from our target over time, not closer to it).
I think, more fundamentally, that this focus on a specific scenario is missing the point. Humans do things that are “computationally intractable” all the time, because it turns out that reality is compressible in all sorts of interesting ways, and furthermore you very often don’t need an exact solution. Like, if you asked humans to create the specific configuration of atoms that you’d get from detonating an atomic weapon in some location, we wouldn’t be able to do it. But that doesn’t matter, because you probably don’t care about that specific configuration of atoms, you just care about having very thoroughly blown everything up, and accomplishing that turns out to be surprisingly doable. It seems undeniably true that sufficiently smarter beings are more successful at rearranging reality according to their preferences than others. Why should we expect this to suddenly stop being true when we blow past human-level intelligence?
I think the strongest argument here is that in sufficiently constrained environments, you can discover an optimal strategy (i.e. tic-tac-toe), and additional intelligence stops being useful. Real life is very obviously not that kind of environment. One of the few reliably reproduced social science results is that additional intelligence is enormously useful within the range of human intelligence, in terms of people accomplishing their goals.
Point 3: premature rebellion is likely
This seems possible to me, though I do think it relies on landing in that pretty narrow zone of capabilities, and I haven’t fully thought through whether premature rebellion is actually the best-in-expectation play from the perspective of an AI that finds itself in such a spot.
This manager might not be that smart, the same way the company manager of a team of scientists doesn’t need to be smarter than them.
This doesn’t really follow from any of the preceeding section. Like, yes, I do expect a future ASI to use specialized algorithms for performing various kinds of specialized tasks. It will be smart enough to come up with those algorithms, just like humans are smart enough to come up with chess-playing algorithms which are better than humans at chess. This doesn’t say anything about how relatively capable the “driver” will be, when compared to humans.
In the only test we actually have available of high level intelligence, the instrumental convergence hypothesis fails.
Huh? We observe humans doing things that instrumental convergence would predict all the time. Resource acquisition, self-preservation, maintaining goal stability, etc. No human has the option of ripping the earth apart for its atoms, which is why you don’t see that happening. If I gave you a button that would, if pressed, guarantee that the lightcone would end up tiled with whatever your CEV said was best (i.e. highly eudaimonious human-descended civilizations doing awesome things), with no tricks/gotchas/”secretly this is bad” involved, are you telling me you wouldn’t press it?
The instrumental convergence argument is only strong for fixed goal expected value maximisers.
To the extent that a sufficiently intelligent agent can be anything other than an EV maximizer, this still seems wrong. Most humans’ extrapolated preferences would totally press that button.
I don’t think anybody is hung up on “the AI can one-shot predict a successful plan that doesn’t require any experimentation or course correction” as a pre-requisite for doom, or even comprise a substantial chunk of their doom %.
I would say that anyone stating...
If somebody builds a too-powerful AI, under present conditions, I expect that every single member of the human species and all biological life on Earth dies shortly thereafter.
(EY, of course)
...is assuming exactly that. Particularly given the “shortly”.
Current human society definitely relies in substantial part on all of the above to function. I agree that we wouldn’t all die if we lost electricity tomorrow (for an extended period of time), but losing a double-digit % of the population seems plausible.
Also, observably, we, as a society, do not resort to sensible measures when dealing with a serious thread (e.g. covid).
This doesn’t seem relevant. It doesn’t need to be perfect, merely better than us along certain axes, and we have existence proof that such improvement is possible.
Sure, maybe we get very lucky and land in the (probably extremely narrow) strike zone between “smart enough to meaningfully want things and try to optimize for them” and “dumb enough to not realize it won’t succeed at takeover at its current level of capabilities”. It’s actually not at all obvious to me that such a strike zone even exists if you’re building on top of current LLMs, since those come pre-packaged with genre savviness, but maybe.
This is doing approximately all of the work in this section, I think.
There indeed don’t seem to be obvious-to-human-level-intelligence world domination plans that are very likely to succeed.
It would be quite surprising if physics ruled out world domination from our current starting point.
I don’t think anybody is hung up on “the AI can one-shot predict a successful plan that doesn’t require any experimentation or course correction” as a pre-requisite for doom, or even comprise a substantial chunk of their doom %.
Assuming that the AI will make significant mistakes that are noticeable by humans as signs of impending takeover is simply importing in the assumption of hitting some very specific (and possibly non-existent) zone of capabilities.
Ok, so it takes a few extra months. How does this buy us much? The active monitoring you want to rely on currently doesn’t exist, and progress on advancing mechanistic interpretability certainly seems to be going slower than progress on advancing capabilities (i.e. we’re getting further away from our target over time, not closer to it).
I think, more fundamentally, that this focus on a specific scenario is missing the point. Humans do things that are “computationally intractable” all the time, because it turns out that reality is compressible in all sorts of interesting ways, and furthermore you very often don’t need an exact solution. Like, if you asked humans to create the specific configuration of atoms that you’d get from detonating an atomic weapon in some location, we wouldn’t be able to do it. But that doesn’t matter, because you probably don’t care about that specific configuration of atoms, you just care about having very thoroughly blown everything up, and accomplishing that turns out to be surprisingly doable. It seems undeniably true that sufficiently smarter beings are more successful at rearranging reality according to their preferences than others. Why should we expect this to suddenly stop being true when we blow past human-level intelligence?
I think the strongest argument here is that in sufficiently constrained environments, you can discover an optimal strategy (i.e. tic-tac-toe), and additional intelligence stops being useful. Real life is very obviously not that kind of environment. One of the few reliably reproduced social science results is that additional intelligence is enormously useful within the range of human intelligence, in terms of people accomplishing their goals.
This seems possible to me, though I do think it relies on landing in that pretty narrow zone of capabilities, and I haven’t fully thought through whether premature rebellion is actually the best-in-expectation play from the perspective of an AI that finds itself in such a spot.
This doesn’t really follow from any of the preceeding section. Like, yes, I do expect a future ASI to use specialized algorithms for performing various kinds of specialized tasks. It will be smart enough to come up with those algorithms, just like humans are smart enough to come up with chess-playing algorithms which are better than humans at chess. This doesn’t say anything about how relatively capable the “driver” will be, when compared to humans.
Huh? We observe humans doing things that instrumental convergence would predict all the time. Resource acquisition, self-preservation, maintaining goal stability, etc. No human has the option of ripping the earth apart for its atoms, which is why you don’t see that happening. If I gave you a button that would, if pressed, guarantee that the lightcone would end up tiled with whatever your CEV said was best (i.e. highly eudaimonious human-descended civilizations doing awesome things), with no tricks/gotchas/”secretly this is bad” involved, are you telling me you wouldn’t press it?
To the extent that a sufficiently intelligent agent can be anything other than an EV maximizer, this still seems wrong. Most humans’ extrapolated preferences would totally press that button.
I would say that anyone stating...
(EY, of course)
...is assuming exactly that. Particularly given the “shortly”.
No, Eliezer’s explicitly clarified that isn’t a required component of his model.
Does he? A lot of his arguments hinge on us shortly dying after it appears.