But, in those cases, it’s most likely better for the AI to wait, and it will know that it’s better to wait, until it gets more powerful.
But why? People foolishly start wars all the time, including in specific circumstances where it would be much better to wait.
(A counterargument here is “an AI might want to launch a pre-emptive strike before other more powerful AIs show up”, which could happen. But, if we win that war, we’re still left with “the sort of tools that can constrain a near-human superintelligence, would not obviously apply to a much smarter AI”, and we still have to solve the same problems.)
Or, having fought a “war” with an AI, we have relatively clear, non-speculative evidence about the consequences of continuing AI development. And that’s the point where you might actually muster politically will to cut that off in the future and take the steps necessary for that to really work.
People do foolishly start wars and the AI might too, we might get warning shots. (See my response to 1a3orn about how that doesn’t change the fact that we only get one try on building safe AGI-powerful-enough-to-confidently-outmaneuver-humanity)
A meta-thing I want to note here:
There are several different arguments here, each about different things. The different things do add up to an overall picture of what seems likely.
I think part of what makes this whole thing hard to think about, is, you really do need to track all the separate arguments and what they imply, and remember that if one argument is overturned, that might change a piece of the picture but not (necessarily) the rest of it.
There might be human-level AI that does normal wars for foolish reasons. And that might get us a warning shot, and that might get us more political will.
But, that’s a different argument than “there is an important difference between an AI smart enough to launch a war, and an AI that is smart enough to confidently outmaneuver all of humanity, and we only get one try to align the second thing.”
I you believe “there’ll probably be warning shots”, that’s an argument against “someone will get to build It”, but not an argument against “if someone built It, everyone would die.” (where “it” specifically means “an AI smart enough to confidently outmaneuver all humanity, built by methods similar to today where they are ‘organically grown’ in hard to predict ways”).
And, if we get a warning shot, we do get to learn from that which will inform some more safeguards and alignment strategies. Which might improve our ability to predict how an AI would grow up. But, that still doesn’t change the “at some point, you’re dealing with a qualitatively different thing that will make different choices.”
I you believe “there’ll probably be warning shots”, that’s an argument against “someone will get to build It”, but not an argument against “if someone built It, everyone would die.” (where “it” specifically means “an AI smart enough to confidently outmaneuver all humanity, built by methods similar to today where they are ‘organically grown’ in hard to predict ways”).
It’s a bit of both.
Suppose there are no warning shots. A hypothetical AI that’s a a bit weaker than humanity but still awfully impressive doesn’t do anything at all that manifests an intent to harm us. That could mean:
The next, somewhat more capable of this AI will not have any intent to harm us because through either luck or design we’ve ended up with a non-threatening AI.
This version of the AI is biding its time to strike and is sufficiently good at deception that we miss that fact.
This AI is fine, but making it a little smarter/more capable will somehow lead to the emergence of malign intent.
I take Yudkowsky and Soares to put all the weight on #2 and #3 (with, based on their scenario, perhaps more of it on #2).
I don’t think that’s right. I think if we have reached the point where an AI really could plausibly start and win a war with us and it doesn’t do anything nasty, there’s a fairly good chance we’re in #1. We may not even really understand how we got into #1, but sometimes things just work out.
I’m not saying this is some kind of great strategy for dealing with the risk; the scenario I’m describing is one where there’s a real chance we all die and I don’t think you get a strong signal until you get into the range where the AI might win, which is a bad range. But it’s still very different than imagining the AI will inherently wait to strike until it has ironclad advantages.
But why? People foolishly start wars all the time, including in specific circumstances where it would be much better to wait.
Or, having fought a “war” with an AI, we have relatively clear, non-speculative evidence about the consequences of continuing AI development. And that’s the point where you might actually muster politically will to cut that off in the future and take the steps necessary for that to really work.
People do foolishly start wars and the AI might too, we might get warning shots. (See my response to 1a3orn about how that doesn’t change the fact that we only get one try on building safe AGI-powerful-enough-to-confidently-outmaneuver-humanity)
A meta-thing I want to note here:
There are several different arguments here, each about different things. The different things do add up to an overall picture of what seems likely.
I think part of what makes this whole thing hard to think about, is, you really do need to track all the separate arguments and what they imply, and remember that if one argument is overturned, that might change a piece of the picture but not (necessarily) the rest of it.
There might be human-level AI that does normal wars for foolish reasons. And that might get us a warning shot, and that might get us more political will.
But, that’s a different argument than “there is an important difference between an AI smart enough to launch a war, and an AI that is smart enough to confidently outmaneuver all of humanity, and we only get one try to align the second thing.”
I you believe “there’ll probably be warning shots”, that’s an argument against “someone will get to build It”, but not an argument against “if someone built It, everyone would die.” (where “it” specifically means “an AI smart enough to confidently outmaneuver all humanity, built by methods similar to today where they are ‘organically grown’ in hard to predict ways”).
And, if we get a warning shot, we do get to learn from that which will inform some more safeguards and alignment strategies. Which might improve our ability to predict how an AI would grow up. But, that still doesn’t change the “at some point, you’re dealing with a qualitatively different thing that will make different choices.”
It’s a bit of both.
Suppose there are no warning shots. A hypothetical AI that’s a a bit weaker than humanity but still awfully impressive doesn’t do anything at all that manifests an intent to harm us. That could mean:
The next, somewhat more capable of this AI will not have any intent to harm us because through either luck or design we’ve ended up with a non-threatening AI.
This version of the AI is biding its time to strike and is sufficiently good at deception that we miss that fact.
This AI is fine, but making it a little smarter/more capable will somehow lead to the emergence of malign intent.
I take Yudkowsky and Soares to put all the weight on #2 and #3 (with, based on their scenario, perhaps more of it on #2).
I don’t think that’s right. I think if we have reached the point where an AI really could plausibly start and win a war with us and it doesn’t do anything nasty, there’s a fairly good chance we’re in #1. We may not even really understand how we got into #1, but sometimes things just work out.
I’m not saying this is some kind of great strategy for dealing with the risk; the scenario I’m describing is one where there’s a real chance we all die and I don’t think you get a strong signal until you get into the range where the AI might win, which is a bad range. But it’s still very different than imagining the AI will inherently wait to strike until it has ironclad advantages.
(btw, you you mentioned reading some other LW reviews, and I wanted to check if you’re read my post which argues some of this at more length)