Yitz comments on Convince me that humanity is as doomed by AGI as Yudkowsky et al., seems to believe

Yitz 21 Apr 2022 20:56 UTC
2 points
0
Thanks for the excellent comment and further questions! A few of these I think I can answer partially, and I’ll try to remember to respond to this post later if I come across any other/better answers to your questions (and perhaps other readers can also answer some now).
I don’t see why a machine that is able to make plans is the same as a machine that is able to execute those plans.
My understanding is that while the two are different in principle, in practice, ensuring that an AGI doesn’t act on that knowledge is an extremely hard problem. Why is it such a hard problem? I have no idea, lol. What is probably relevant here is Yudkowsky’s AI-in-a-box experiment, which purports (successfully imo, though I know it’s controversial) to show that even an AI which can only interface with the world via text can convince humans to act on its behalf, even if the humans are strongly incentivized not to do so. If you have an AI which dreams up an AGI, that AGI is now in existence, albeit heavily boxed. If it can convince the containing AI that releasing it would help it fulfil its goal of predicting things properly or whatever, then we’re still doomed. However, this line of argument feels weak to me, especially if it doesn’t require already having an AGI in order to know how to build one (which I would assume to be the case). Your general point stands, and I don’t know the technical reason why differentiating between “imagination” and “action” (as you excellently put it) is so hard.
I don’t see why a large fraction of the community assumes that extraordinary things like nanotechnology can be achieved very quickly and no major hurdles will be found, even with AGI.
A partial response to this may be that it doesn’t need to be nanotechnology, or any one invention, which will be achieved quickly. All we need for AGI to be existentially dangerous is for it to be able to make a major breakthrough in some area which gives it power to destroy us. See for example this story, where an AI was able to create a whole bunch of extremely deadly chemical weapons with barely any major modifications to its original code. This suggests that while there may in fact be hurdles for an AGI to overcome in nanotech and elsewhere, that won’t really matter much for world-ending purposes. The technology mostly exists already, and it would just be a matter of convincing the right people to take a fairly simple sequence of actions.
I don’t see why we are taken [sic?] for granted that there are no limits to the capacity of an AGI in terms of capacity for knowledge/planning.
Do we take that for granted? I don’t think we really need to assume a FOOM scenario for an AGI to do tremendous damage. Just by ourselves, with human-level intelligence, we’ve gotten close to destroying the world a few too many times to be reassuring. Imagine if an Einstein-level human genius decided to devote themselves to killing humanity. They probably wouldn’t succeed, but I sure wouldn’t bet on it! I can personally think of a few things I could do if I was marginally smarter/more resourceful which could plausibly kill 1,000,000,000+ people (don’t worry, I have no intentions of doing anything nefarious). AGI doesn’t need to be all that smarter than us to be an X-risk level threat, if it’s too horrifically unaligned.
- mukashi 22 Apr 2022 17:34 UTC
  3 points
  0
  Parent
  Hi Yitz, just a clarification. In my view p(doom) != 0. I can’t say any meaningful number but if you force me to give you an estimate, it would be probably close to 1% in the next 50 years. Maybe less, maybe a bit more, but in the ballpark. I find EY et al.’s arguments about what is possible compelling: I think that extinction by AI is definitely a possibility. This means that it makes a lot of sense to explore this subject as they are doing, and they have my most sincere admiration for carrying out their research outside conventional academia. What I most disagree about is their estimate of the likelihood of such an event: most of the discussions I have read are about how doom is just a fait accompli: it is not so much a question of will it take place? but, when? And they are looking into the future making a set of predictions that seem bizarrely precise, trying to say how things will happen step by step (I am thinking mostly about the conversations among the MIRI leaders that took place a few months ago). The reasons stated above (and the ones that I added in the comment I made in your other post) are mostly reasons why things could go differently. So for instance, yes, I can envision a machine that is able to imagine and act. But I can also envision the opposite thing, and that’s what I am trying to convey: that there are many reasons why things could go differently. For now, it seems to me that the doom predictions will fail, and will fail badly. Brian Caplan is getting that money.
  Something else I want to raise is that we seem to have different definitions of doom.
  I can personally think of a few things I could do if I was marginally smarter/more resourceful which could plausibly kill 1,000,000,000+ people (don’t worry, I have no intentions of doing anything nefarious). AGI doesn’t need to be all that smarter than us to be an X-risk level threat, if it’s too horrifically unaligned.
  Oh yes, I totally agree with this (although maybe not in 10 years), that’s why I think it makes a lot of sense to carry out research on alignment. But watch out: EY would tell you* that if an AGI decides to kill only 1 billion people, then you would have solved the alignment problem! So it seems we have different versions of doom.
  For me, a valid definition of doom is—Everyone who can continue making any significant technological progress dies, and the process is irreversible. If the whole Western World disappears and only China remains, that is a catastrophe, but the world keeps going. If the only people alive are the guys in the Andaman Islands, that is pretty much game over, and then we are talking about a doom scenario.
  *I remember reading once that sentence quite literally from EY, I think it was in the context of an AGI killing all the world except China, or something similar. If someone can find the reference that would be great, otherwise, I hope I am not misrepresenting what the big man said himself. If I am, happy to retract this comment.