habryka comments on The prototypical catastrophic AI action is getting root access to its datacenter

habryka 6 Jun 2022 18:32 UTC
LW: 6 AF: 4
0
AF
Yeah, OK, I think this distinction makes sense, and I do feel like this distinction is important.
Having settled this, my primary response is:
Sure, I guess it’s the most prototypical catastrophic action until we have solved it, but like, even if we solve it, we haven’t solved the problem where the AI does actually get a lot smarter than humans and takes a substantially more “positive-sum” action and kills approximately everyone with the use of a bioweapon, or launches all the nukes, or develops nanotechnology. We do have to solve this problem first, but the hard problem is the part where it seems hard to stop further AI development without having a system that is also capable of killing all (or approximately all) the humans, so calling this easy problem the “prototypical catastrophic action” feels wrong to me. Solving this problem is necessary, but not sufficient for solving AI Alignment, and while it is this stage and earlier stages where I expect most worlds to end, I expect most worlds that make it past this stage to not survive either.
I think given this belief, I would think your new title is more wrong than the current title (I mean, maybe it’s “mostly”, because we are going to die in a low-dignity way as Eliezer would say, but it’s not obviously where most of the difficulty lies).
- Buck 7 Jun 2022 5:22 UTC
  LW: 14 AF: 5
  0
  AF Parent
  I’m using “catastrophic” in the technical sense of “unacceptably bad even if it happens very rarely, and even if the AI does what you wanted the rest of the time”, rather than “very bad thing that happens because of AI”, apologies if this was confusing.
  My guess is that you will wildly disagree with the frame I’m going to use here, but I’ll just spell it out anyway: I’m interested in “catastrophes” as a remaining problem after you have solved the scalable oversight problem. If your action is able to do one of these “positive-sum” pivotal acts in a single action, and you haven’t already lost control, then you can use your overseer to oversee the AI as it takes actions, and you by assumption only have to watch it for a small number of actions (maybe I want to say episodes rather than actions) before it’s done some crazy powerful stuff and saved the world. So I think I stand by the claim that those pivotal acts aren’t where much of the x-risk from AI catastrophic action (in the specific sense I’m using) comes from.
  Thanks again for your thoughts here, they clarified several things for me.