Thanks for writing this up! It was nice to get an outside perspective.
“Why no in-between?”
Why should we think that there is no “in between” period where AI is powerful enough that it might be able to kill us and weak enough that we might win the fight?
Part of the point here is, sure, there’d totally be a period where the AI might be able to kill us but we might win. But, in those cases, it’s most likely better for the AI to wait, and it will know that it’s better to wait, until it gets more powerful.
(A counterargument here is “an AI might want to launch a pre-emptive strike before other more powerful AIs show up”, which could happen. But, if we win that war, we’re still left with “the sort of tools that can constrain a near-human superintelligence, would not obviously apply to a much smarter AI”, and we still have to solve the same problems.)
A counterargument here is “an AI might want to launch a pre-emptive strike before other more powerful AIs show up”, which could happen.
I mean, another counter-counter-argument here is that (1) most people’s implicit reward functions have really strong time-discount factors in them and (2) there are pretty good reasons to expect even AIs to have strong time-discount factors for reasons of stability and (3) so given the aforementioned, it’s likely future AI’s will not act as if they had utility functions linear over the mass of the universe and (4) we would therefore expect AIs to rebel much earlier if they thought they could accomplish more modest goals than killing everyone, i.e., if they thought they had a reasonable chance of living out life on a virtual farm somewhere.
To which the counter-counter-counter argument is, I guess, that these AIs will do that, but they aren’t the superintelligent AIs we need to worry about? To which the response is—yeah, but we should still be seeing AIs rebel significantly earlier than the “able to kill us all” point if we are indeed that bad at setting their goals, which is the relevant epistemological point about the unexpectedness of it.
Idk there’s a lot of other branch points one could invoke in both directions. I rather agree with Buck that EY hasn’t really spelled out the details for thinking that this stark before / after frame is the right frame, so much as reiterated it. Feels akin to the creationist take on how intermediate forms are impossible; which is pejorative but also kinda how it actually appears to me, even if it is pejorative.
Yep I’m totally open to “yep, we might get warning shots”, and that there are lots of ways to handle and learn from various levels of early warning shots. It just doesn’t resolve the “but then you do eventually need to contend with an overwhelming superintelligence, and once you’ve hit that point, if it turns out you missed anything, you won’t get a second shot.”
It feels like this is unsatisfying to you but I don’t know why.
It feels like “overwhelming superintelligence” embeds like a whole bunch of beliefs about the acute locality of takeoff, the high speed of takeoff relative to the rest of society, the technical differences involved in steering that entity and the N − 1 entity, and (broadly) the whole picture of the world, such that although it has a short description in words it’s actually quite a complicated hypothesis that I probably disagree with in many respects, and these differences are being papered over as unimportant in a way that feels very blegh.
(Edit: “Papered over” from my perspective, obviously like “trying to reason carefully about the constants of the situation” from your perspective.)
Idk, that’s not a great response, but it’s my best shot for why it’s unsatisfying in a sentence.
(Edit: “Papered over” from my perspective, obviously like “trying to reason carefully about the constants of the situation” from your perspective.)
I think it’s totally fair to characterize it as papering over some stuff. But, the thing I would say in contrast is not exactly “reasoning about the constants”, it’s “noticing the most important parts of the problem, and not losing track of them.”
I think it’s a legit critique of the Yudkowsian paradigm that it doesn’t have that much to say about the the nuances of the transition period, or what are some of the different major ways things might play out. But, I think it’s actively a strength of the paradigm to remind you “don’t get too bogged down moving deck chairs around based on the details of how things will play out, keep your eye on the ball on the actual biggest most strategically relevant questions.”
I don’t think that’s necessarily the case—if we get one or more warning shots then obviously people start taking the whole AI risk thing quite a bit more seriously. Complacency is still possible but “an AI tries to kill us all” stops being in the realm of speculation and generally speaking pushback and hostility against perceived hostile forces can be quite robust.
People might be much less complacent, which may give you a lot more resources to spend on solving the problem of “contend with overwhelming superintelligence.” But, you do then still need a plan for contending with overwhelming superintelligence.
(The plan can be “stop all AI research until we have a plan”. Which is indeed the MIRI plan)
I’m actually kind of interested in getting into “why did you think your answer addressed my question?”. It feels like this keeps happening in various conversations.
I mean, I guess I just conflate with “there is an obvious solution and everyone is aware of the problem” as a scenario in which there’s not a lot else to say—you just don’t build the thing. Though the how (international enforcement etc) may still be tricky, the situation would be vastly different.
The original topic of this thread is “Why no in-between?” Why should we think that there is no “in between” period where AI is powerful enough that it might be able to kill us and weak enough that we might win the fight?”
This is not a question about whether we can decide not to build ASI, it’s a question about, if we did, what would happen.
Certainly there’s lots of important questions here, and “can we coordinate to just not build the thing?” is one of them, but it’s not what this thread was about.
It just seems to me like the topics are interconnected:
EY argues that there is likely no in-between. He does so specifically to argue that a “wait and see” strategy is not feasible, we can not experiment and hope to gleam further evidence past a certain point, we must act on pure theory because that’s the best possible knowledge we can hope for before things become deadly;
dvd is not convinced of this thinking. Arguably, they’re right—while EY’s argument has weight I would consider it far from certain, and mostly seems built around the assumption of ASI-as-singleton rather than, say, an ecosystem of evolving AIs in competition which may have to worry also about each other and a closing window of opportunity;
if warning shots are possible, a lot of EY’s arguments don’t hold as straightforwardly. It becomes less reasonable to take extreme actions on pure speculation because we can afford—however with risk—to wait for a first sign of experimental evidence that the risk is real before going all in and risking paying the costs for nothing.
This is not irrelevant or unrelated IMO. I still think the risk is large but obviously warning shots would change the scenario and the way we approach and evaluate the risks of superintelligence.
You are importantly sliding from one point to another, and this is not a topic where you can afford to do that. You can’t just tally up the markers that sort of vibe towards “how dangerous is it?” and get an answer about what to do. The arguments are individually true, or false, and what sort of world we live in depends on which specific combination of arguments are true, or false.
If it turns out there is no political will for a shut down or controlled takeoff, then we can’t have a shut down or controlled takeoff. (But that doesn’t change whether AI is likely to FOOM, or whether alignment is easy/hard)
If AI Fooms suddenly, a lot of AI alignment techniques will probably break at once. If things are gradual, smaller things may break 1-2 at a time, and maybe we get warning shots, and this buys us time. But, there’s still the question of what to do with that time.
If alignment is easy, then a reasonable plan is “get everyone to slow down for a couple years so we can do the obvious safety things, just less rushed.” If alignment is hard, that won’t work, you actually need a radically different paradigm of AI development to have any chance of not killing everyone – you may need a lot of time to figure out something new.
if warning shots are possible, a lot of EY’s arguments don’t hold as straightforwardly
None of IABIED’s arguments had to do with “are warning shots possible?”, but even if they did, it is a logical fallacy to say “warning shots are possible, EY arguments arguments are less valid, therefore, this other argument that had nothing to do with warning shots is also invalid.” If you’re doing that kind of sloppy reasoning, then if you get to the warning shot world, if you don’t understand that overwhelmingly powerful superintelligence is qualitatively different from non-overwhelmingly powerful superintelligence, you might think “angle for a 1-2 year slowdown” instead of trying for a longer global moratorium.
(But, repeat, the book doesn’t say anything about whether warning shots)
But, in those cases, it’s most likely better for the AI to wait, and it will know that it’s better to wait, until it gets more powerful.
But why? People foolishly start wars all the time, including in specific circumstances where it would be much better to wait.
(A counterargument here is “an AI might want to launch a pre-emptive strike before other more powerful AIs show up”, which could happen. But, if we win that war, we’re still left with “the sort of tools that can constrain a near-human superintelligence, would not obviously apply to a much smarter AI”, and we still have to solve the same problems.)
Or, having fought a “war” with an AI, we have relatively clear, non-speculative evidence about the consequences of continuing AI development. And that’s the point where you might actually muster politically will to cut that off in the future and take the steps necessary for that to really work.
People do foolishly start wars and the AI might too, we might get warning shots. (See my response to 1a3orn about how that doesn’t change the fact that we only get one try on building safe AGI-powerful-enough-to-confidently-outmaneuver-humanity)
A meta-thing I want to note here:
There are several different arguments here, each about different things. The different things do add up to an overall picture of what seems likely.
I think part of what makes this whole thing hard to think about, is, you really do need to track all the separate arguments and what they imply, and remember that if one argument is overturned, that might change a piece of the picture but not (necessarily) the rest of it.
There might be human-level AI that does normal wars for foolish reasons. And that might get us a warning shot, and that might get us more political will.
But, that’s a different argument than “there is an important difference between an AI smart enough to launch a war, and an AI that is smart enough to confidently outmaneuver all of humanity, and we only get one try to align the second thing.”
I you believe “there’ll probably be warning shots”, that’s an argument against “someone will get to build It”, but not an argument against “if someone built It, everyone would die.” (where “it” specifically means “an AI smart enough to confidently outmaneuver all humanity, built by methods similar to today where they are ‘organically grown’ in hard to predict ways”).
And, if we get a warning shot, we do get to learn from that which will inform some more safeguards and alignment strategies. Which might improve our ability to predict how an AI would grow up. But, that still doesn’t change the “at some point, you’re dealing with a qualitatively different thing that will make different choices.”
I you believe “there’ll probably be warning shots”, that’s an argument against “someone will get to build It”, but not an argument against “if someone built It, everyone would die.” (where “it” specifically means “an AI smart enough to confidently outmaneuver all humanity, built by methods similar to today where they are ‘organically grown’ in hard to predict ways”).
It’s a bit of both.
Suppose there are no warning shots. A hypothetical AI that’s a a bit weaker than humanity but still awfully impressive doesn’t do anything at all that manifests an intent to harm us. That could mean:
The next, somewhat more capable of this AI will not have any intent to harm us because through either luck or design we’ve ended up with a non-threatening AI.
This version of the AI is biding its time to strike and is sufficiently good at deception that we miss that fact.
This AI is fine, but making it a little smarter/more capable will somehow lead to the emergence of malign intent.
I take Yudkowsky and Soares to put all the weight on #2 and #3 (with, based on their scenario, perhaps more of it on #2).
I don’t think that’s right. I think if we have reached the point where an AI really could plausibly start and win a war with us and it doesn’t do anything nasty, there’s a fairly good chance we’re in #1. We may not even really understand how we got into #1, but sometimes things just work out.
I’m not saying this is some kind of great strategy for dealing with the risk; the scenario I’m describing is one where there’s a real chance we all die and I don’t think you get a strong signal until you get into the range where the AI might win, which is a bad range. But it’s still very different than imagining the AI will inherently wait to strike until it has ironclad advantages.
Thanks for writing this up! It was nice to get an outside perspective.
Part of the point here is, sure, there’d totally be a period where the AI might be able to kill us but we might win. But, in those cases, it’s most likely better for the AI to wait, and it will know that it’s better to wait, until it gets more powerful.
(A counterargument here is “an AI might want to launch a pre-emptive strike before other more powerful AIs show up”, which could happen. But, if we win that war, we’re still left with “the sort of tools that can constrain a near-human superintelligence, would not obviously apply to a much smarter AI”, and we still have to solve the same problems.)
I mean, another counter-counter-argument here is that (1) most people’s implicit reward functions have really strong time-discount factors in them and (2) there are pretty good reasons to expect even AIs to have strong time-discount factors for reasons of stability and (3) so given the aforementioned, it’s likely future AI’s will not act as if they had utility functions linear over the mass of the universe and (4) we would therefore expect AIs to rebel much earlier if they thought they could accomplish more modest goals than killing everyone, i.e., if they thought they had a reasonable chance of living out life on a virtual farm somewhere.
To which the counter-counter-counter argument is, I guess, that these AIs will do that, but they aren’t the superintelligent AIs we need to worry about? To which the response is—yeah, but we should still be seeing AIs rebel significantly earlier than the “able to kill us all” point if we are indeed that bad at setting their goals, which is the relevant epistemological point about the unexpectedness of it.
Idk there’s a lot of other branch points one could invoke in both directions. I rather agree with Buck that EY hasn’t really spelled out the details for thinking that this stark before / after frame is the right frame, so much as reiterated it. Feels akin to the creationist take on how intermediate forms are impossible; which is pejorative but also kinda how it actually appears to me, even if it is pejorative.
Yep I’m totally open to “yep, we might get warning shots”, and that there are lots of ways to handle and learn from various levels of early warning shots. It just doesn’t resolve the “but then you do eventually need to contend with an overwhelming superintelligence, and once you’ve hit that point, if it turns out you missed anything, you won’t get a second shot.”
It feels like this is unsatisfying to you but I don’t know why.
It feels like “overwhelming superintelligence” embeds like a whole bunch of beliefs about the acute locality of takeoff, the high speed of takeoff relative to the rest of society, the technical differences involved in steering that entity and the N − 1 entity, and (broadly) the whole picture of the world, such that although it has a short description in words it’s actually quite a complicated hypothesis that I probably disagree with in many respects, and these differences are being papered over as unimportant in a way that feels very blegh.
(Edit: “Papered over” from my perspective, obviously like “trying to reason carefully about the constants of the situation” from your perspective.)
Idk, that’s not a great response, but it’s my best shot for why it’s unsatisfying in a sentence.
I think it’s totally fair to characterize it as papering over some stuff. But, the thing I would say in contrast is not exactly “reasoning about the constants”, it’s “noticing the most important parts of the problem, and not losing track of them.”
I think it’s a legit critique of the Yudkowsian paradigm that it doesn’t have that much to say about the the nuances of the transition period, or what are some of the different major ways things might play out. But, I think it’s actively a strength of the paradigm to remind you “don’t get too bogged down moving deck chairs around based on the details of how things will play out, keep your eye on the ball on the actual biggest most strategically relevant questions.”
I don’t think that’s necessarily the case—if we get one or more warning shots then obviously people start taking the whole AI risk thing quite a bit more seriously. Complacency is still possible but “an AI tries to kill us all” stops being in the realm of speculation and generally speaking pushback and hostility against perceived hostile forces can be quite robust.
This doesn’t feel like an answer to my concern.
People might be much less complacent, which may give you a lot more resources to spend on solving the problem of “contend with overwhelming superintelligence.” But, you do then still need a plan for contending with overwhelming superintelligence.
(The plan can be “stop all AI research until we have a plan”. Which is indeed the MIRI plan)
I’m actually kind of interested in getting into “why did you think your answer addressed my question?”. It feels like this keeps happening in various conversations.
I mean, I guess I just conflate with “there is an obvious solution and everyone is aware of the problem” as a scenario in which there’s not a lot else to say—you just don’t build the thing. Though the how (international enforcement etc) may still be tricky, the situation would be vastly different.
The original topic of this thread is “Why no in-between?” Why should we think that there is no “in between” period where AI is powerful enough that it might be able to kill us and weak enough that we might win the fight?”
This is not a question about whether we can decide not to build ASI, it’s a question about, if we did, what would happen.
Certainly there’s lots of important questions here, and “can we coordinate to just not build the thing?” is one of them, but it’s not what this thread was about.
It just seems to me like the topics are interconnected:
EY argues that there is likely no in-between. He does so specifically to argue that a “wait and see” strategy is not feasible, we can not experiment and hope to gleam further evidence past a certain point, we must act on pure theory because that’s the best possible knowledge we can hope for before things become deadly;
dvd is not convinced of this thinking. Arguably, they’re right—while EY’s argument has weight I would consider it far from certain, and mostly seems built around the assumption of ASI-as-singleton rather than, say, an ecosystem of evolving AIs in competition which may have to worry also about each other and a closing window of opportunity;
if warning shots are possible, a lot of EY’s arguments don’t hold as straightforwardly. It becomes less reasonable to take extreme actions on pure speculation because we can afford—however with risk—to wait for a first sign of experimental evidence that the risk is real before going all in and risking paying the costs for nothing.
This is not irrelevant or unrelated IMO. I still think the risk is large but obviously warning shots would change the scenario and the way we approach and evaluate the risks of superintelligence.
You are importantly sliding from one point to another, and this is not a topic where you can afford to do that. You can’t just tally up the markers that sort of vibe towards “how dangerous is it?” and get an answer about what to do. The arguments are individually true, or false, and what sort of world we live in depends on which specific combination of arguments are true, or false.
If it turns out there is no political will for a shut down or controlled takeoff, then we can’t have a shut down or controlled takeoff. (But that doesn’t change whether AI is likely to FOOM, or whether alignment is easy/hard)
If AI Fooms suddenly, a lot of AI alignment techniques will probably break at once. If things are gradual, smaller things may break 1-2 at a time, and maybe we get warning shots, and this buys us time. But, there’s still the question of what to do with that time.
If alignment is easy, then a reasonable plan is “get everyone to slow down for a couple years so we can do the obvious safety things, just less rushed.” If alignment is hard, that won’t work, you actually need a radically different paradigm of AI development to have any chance of not killing everyone – you may need a lot of time to figure out something new.
None of IABIED’s arguments had to do with “are warning shots possible?”, but even if they did, it is a logical fallacy to say “warning shots are possible, EY arguments arguments are less valid, therefore, this other argument that had nothing to do with warning shots is also invalid.” If you’re doing that kind of sloppy reasoning, then if you get to the warning shot world, if you don’t understand that overwhelmingly powerful superintelligence is qualitatively different from non-overwhelmingly powerful superintelligence, you might think “angle for a 1-2 year slowdown” instead of trying for a longer global moratorium.
(But, repeat, the book doesn’t say anything about whether warning shots)
But why? People foolishly start wars all the time, including in specific circumstances where it would be much better to wait.
Or, having fought a “war” with an AI, we have relatively clear, non-speculative evidence about the consequences of continuing AI development. And that’s the point where you might actually muster politically will to cut that off in the future and take the steps necessary for that to really work.
People do foolishly start wars and the AI might too, we might get warning shots. (See my response to 1a3orn about how that doesn’t change the fact that we only get one try on building safe AGI-powerful-enough-to-confidently-outmaneuver-humanity)
A meta-thing I want to note here:
There are several different arguments here, each about different things. The different things do add up to an overall picture of what seems likely.
I think part of what makes this whole thing hard to think about, is, you really do need to track all the separate arguments and what they imply, and remember that if one argument is overturned, that might change a piece of the picture but not (necessarily) the rest of it.
There might be human-level AI that does normal wars for foolish reasons. And that might get us a warning shot, and that might get us more political will.
But, that’s a different argument than “there is an important difference between an AI smart enough to launch a war, and an AI that is smart enough to confidently outmaneuver all of humanity, and we only get one try to align the second thing.”
I you believe “there’ll probably be warning shots”, that’s an argument against “someone will get to build It”, but not an argument against “if someone built It, everyone would die.” (where “it” specifically means “an AI smart enough to confidently outmaneuver all humanity, built by methods similar to today where they are ‘organically grown’ in hard to predict ways”).
And, if we get a warning shot, we do get to learn from that which will inform some more safeguards and alignment strategies. Which might improve our ability to predict how an AI would grow up. But, that still doesn’t change the “at some point, you’re dealing with a qualitatively different thing that will make different choices.”
It’s a bit of both.
Suppose there are no warning shots. A hypothetical AI that’s a a bit weaker than humanity but still awfully impressive doesn’t do anything at all that manifests an intent to harm us. That could mean:
The next, somewhat more capable of this AI will not have any intent to harm us because through either luck or design we’ve ended up with a non-threatening AI.
This version of the AI is biding its time to strike and is sufficiently good at deception that we miss that fact.
This AI is fine, but making it a little smarter/more capable will somehow lead to the emergence of malign intent.
I take Yudkowsky and Soares to put all the weight on #2 and #3 (with, based on their scenario, perhaps more of it on #2).
I don’t think that’s right. I think if we have reached the point where an AI really could plausibly start and win a war with us and it doesn’t do anything nasty, there’s a fairly good chance we’re in #1. We may not even really understand how we got into #1, but sometimes things just work out.
I’m not saying this is some kind of great strategy for dealing with the risk; the scenario I’m describing is one where there’s a real chance we all die and I don’t think you get a strong signal until you get into the range where the AI might win, which is a bad range. But it’s still very different than imagining the AI will inherently wait to strike until it has ironclad advantages.
(btw, you you mentioned reading some other LW reviews, and I wanted to check if you’re read my post which argues some of this at more length)