A counterargument here is “an AI might want to launch a pre-emptive strike before other more powerful AIs show up”, which could happen.
I mean, another counter-counter-argument here is that (1) most people’s implicit reward functions have really strong time-discount factors in them and (2) there are pretty good reasons to expect even AIs to have strong time-discount factors for reasons of stability and (3) so given the aforementioned, it’s likely future AI’s will not act as if they had utility functions linear over the mass of the universe and (4) we would therefore expect AIs to rebel much earlier if they thought they could accomplish more modest goals than killing everyone, i.e., if they thought they had a reasonable chance of living out life on a virtual farm somewhere.
To which the counter-counter-counter argument is, I guess, that these AIs will do that, but they aren’t the superintelligent AIs we need to worry about? To which the response is—yeah, but we should still be seeing AIs rebel significantly earlier than the “able to kill us all” point if we are indeed that bad at setting their goals, which is the relevant epistemological point about the unexpectedness of it.
Idk there’s a lot of other branch points one could invoke in both directions. I rather agree with Buck that EY hasn’t really spelled out the details for thinking that this stark before / after frame is the right frame, so much as reiterated it. Feels akin to the creationist take on how intermediate forms are impossible; which is pejorative but also kinda how it actually appears to me, even if it is pejorative.
Yep I’m totally open to “yep, we might get warning shots”, and that there are lots of ways to handle and learn from various levels of early warning shots. It just doesn’t resolve the “but then you do eventually need to contend with an overwhelming superintelligence, and once you’ve hit that point, if it turns out you missed anything, you won’t get a second shot.”
It feels like this is unsatisfying to you but I don’t know why.
It feels like “overwhelming superintelligence” embeds like a whole bunch of beliefs about the acute locality of takeoff, the high speed of takeoff relative to the rest of society, the technical differences involved in steering that entity and the N − 1 entity, and (broadly) the whole picture of the world, such that although it has a short description in words it’s actually quite a complicated hypothesis that I probably disagree with in many respects, and these differences are being papered over as unimportant in a way that feels very blegh.
(Edit: “Papered over” from my perspective, obviously like “trying to reason carefully about the constants of the situation” from your perspective.)
Idk, that’s not a great response, but it’s my best shot for why it’s unsatisfying in a sentence.
(Edit: “Papered over” from my perspective, obviously like “trying to reason carefully about the constants of the situation” from your perspective.)
I think it’s totally fair to characterize it as papering over some stuff. But, the thing I would say in contrast is not exactly “reasoning about the constants”, it’s “noticing the most important parts of the problem, and not losing track of them.”
I think it’s a legit critique of the Yudkowsian paradigm that it doesn’t have that much to say about the the nuances of the transition period, or what are some of the different major ways things might play out. But, I think it’s actively a strength of the paradigm to remind you “don’t get too bogged down moving deck chairs around based on the details of how things will play out, keep your eye on the ball on the actual biggest most strategically relevant questions.”
I don’t think that’s necessarily the case—if we get one or more warning shots then obviously people start taking the whole AI risk thing quite a bit more seriously. Complacency is still possible but “an AI tries to kill us all” stops being in the realm of speculation and generally speaking pushback and hostility against perceived hostile forces can be quite robust.
People might be much less complacent, which may give you a lot more resources to spend on solving the problem of “contend with overwhelming superintelligence.” But, you do then still need a plan for contending with overwhelming superintelligence.
(The plan can be “stop all AI research until we have a plan”. Which is indeed the MIRI plan)
I’m actually kind of interested in getting into “why did you think your answer addressed my question?”. It feels like this keeps happening in various conversations.
I mean, I guess I just conflate with “there is an obvious solution and everyone is aware of the problem” as a scenario in which there’s not a lot else to say—you just don’t build the thing. Though the how (international enforcement etc) may still be tricky, the situation would be vastly different.
The original topic of this thread is “Why no in-between?” Why should we think that there is no “in between” period where AI is powerful enough that it might be able to kill us and weak enough that we might win the fight?”
This is not a question about whether we can decide not to build ASI, it’s a question about, if we did, what would happen.
Certainly there’s lots of important questions here, and “can we coordinate to just not build the thing?” is one of them, but it’s not what this thread was about.
It just seems to me like the topics are interconnected:
EY argues that there is likely no in-between. He does so specifically to argue that a “wait and see” strategy is not feasible, we can not experiment and hope to gleam further evidence past a certain point, we must act on pure theory because that’s the best possible knowledge we can hope for before things become deadly;
dvd is not convinced of this thinking. Arguably, they’re right—while EY’s argument has weight I would consider it far from certain, and mostly seems built around the assumption of ASI-as-singleton rather than, say, an ecosystem of evolving AIs in competition which may have to worry also about each other and a closing window of opportunity;
if warning shots are possible, a lot of EY’s arguments don’t hold as straightforwardly. It becomes less reasonable to take extreme actions on pure speculation because we can afford—however with risk—to wait for a first sign of experimental evidence that the risk is real before going all in and risking paying the costs for nothing.
This is not irrelevant or unrelated IMO. I still think the risk is large but obviously warning shots would change the scenario and the way we approach and evaluate the risks of superintelligence.
You are importantly sliding from one point to another, and this is not a topic where you can afford to do that. You can’t just tally up the markers that sort of vibe towards “how dangerous is it?” and get an answer about what to do. The arguments are individually true, or false, and what sort of world we live in depends on which specific combination of arguments are true, or false.
If it turns out there is no political will for a shut down or controlled takeoff, then we can’t have a shut down or controlled takeoff. (But that doesn’t change whether AI is likely to FOOM, or whether alignment is easy/hard)
If AI Fooms suddenly, a lot of AI alignment techniques will probably break at once. If things are gradual, smaller things may break 1-2 at a time, and maybe we get warning shots, and this buys us time. But, there’s still the question of what to do with that time.
If alignment is easy, then a reasonable plan is “get everyone to slow down for a couple years so we can do the obvious safety things, just less rushed.” If alignment is hard, that won’t work, you actually need a radically different paradigm of AI development to have any chance of not killing everyone – you may need a lot of time to figure out something new.
if warning shots are possible, a lot of EY’s arguments don’t hold as straightforwardly
None of IABIED’s arguments had to do with “are warning shots possible?”, but even if they did, it is a logical fallacy to say “warning shots are possible, EY arguments arguments are less valid, therefore, this other argument that had nothing to do with warning shots is also invalid.” If you’re doing that kind of sloppy reasoning, then if you get to the warning shot world, if you don’t understand that overwhelmingly powerful superintelligence is qualitatively different from non-overwhelmingly powerful superintelligence, you might think “angle for a 1-2 year slowdown” instead of trying for a longer global moratorium.
(But, repeat, the book doesn’t say anything about whether warning shots)
I mean, another counter-counter-argument here is that (1) most people’s implicit reward functions have really strong time-discount factors in them and (2) there are pretty good reasons to expect even AIs to have strong time-discount factors for reasons of stability and (3) so given the aforementioned, it’s likely future AI’s will not act as if they had utility functions linear over the mass of the universe and (4) we would therefore expect AIs to rebel much earlier if they thought they could accomplish more modest goals than killing everyone, i.e., if they thought they had a reasonable chance of living out life on a virtual farm somewhere.
To which the counter-counter-counter argument is, I guess, that these AIs will do that, but they aren’t the superintelligent AIs we need to worry about? To which the response is—yeah, but we should still be seeing AIs rebel significantly earlier than the “able to kill us all” point if we are indeed that bad at setting their goals, which is the relevant epistemological point about the unexpectedness of it.
Idk there’s a lot of other branch points one could invoke in both directions. I rather agree with Buck that EY hasn’t really spelled out the details for thinking that this stark before / after frame is the right frame, so much as reiterated it. Feels akin to the creationist take on how intermediate forms are impossible; which is pejorative but also kinda how it actually appears to me, even if it is pejorative.
Yep I’m totally open to “yep, we might get warning shots”, and that there are lots of ways to handle and learn from various levels of early warning shots. It just doesn’t resolve the “but then you do eventually need to contend with an overwhelming superintelligence, and once you’ve hit that point, if it turns out you missed anything, you won’t get a second shot.”
It feels like this is unsatisfying to you but I don’t know why.
It feels like “overwhelming superintelligence” embeds like a whole bunch of beliefs about the acute locality of takeoff, the high speed of takeoff relative to the rest of society, the technical differences involved in steering that entity and the N − 1 entity, and (broadly) the whole picture of the world, such that although it has a short description in words it’s actually quite a complicated hypothesis that I probably disagree with in many respects, and these differences are being papered over as unimportant in a way that feels very blegh.
(Edit: “Papered over” from my perspective, obviously like “trying to reason carefully about the constants of the situation” from your perspective.)
Idk, that’s not a great response, but it’s my best shot for why it’s unsatisfying in a sentence.
I think it’s totally fair to characterize it as papering over some stuff. But, the thing I would say in contrast is not exactly “reasoning about the constants”, it’s “noticing the most important parts of the problem, and not losing track of them.”
I think it’s a legit critique of the Yudkowsian paradigm that it doesn’t have that much to say about the the nuances of the transition period, or what are some of the different major ways things might play out. But, I think it’s actively a strength of the paradigm to remind you “don’t get too bogged down moving deck chairs around based on the details of how things will play out, keep your eye on the ball on the actual biggest most strategically relevant questions.”
I don’t think that’s necessarily the case—if we get one or more warning shots then obviously people start taking the whole AI risk thing quite a bit more seriously. Complacency is still possible but “an AI tries to kill us all” stops being in the realm of speculation and generally speaking pushback and hostility against perceived hostile forces can be quite robust.
This doesn’t feel like an answer to my concern.
People might be much less complacent, which may give you a lot more resources to spend on solving the problem of “contend with overwhelming superintelligence.” But, you do then still need a plan for contending with overwhelming superintelligence.
(The plan can be “stop all AI research until we have a plan”. Which is indeed the MIRI plan)
I’m actually kind of interested in getting into “why did you think your answer addressed my question?”. It feels like this keeps happening in various conversations.
I mean, I guess I just conflate with “there is an obvious solution and everyone is aware of the problem” as a scenario in which there’s not a lot else to say—you just don’t build the thing. Though the how (international enforcement etc) may still be tricky, the situation would be vastly different.
The original topic of this thread is “Why no in-between?” Why should we think that there is no “in between” period where AI is powerful enough that it might be able to kill us and weak enough that we might win the fight?”
This is not a question about whether we can decide not to build ASI, it’s a question about, if we did, what would happen.
Certainly there’s lots of important questions here, and “can we coordinate to just not build the thing?” is one of them, but it’s not what this thread was about.
It just seems to me like the topics are interconnected:
EY argues that there is likely no in-between. He does so specifically to argue that a “wait and see” strategy is not feasible, we can not experiment and hope to gleam further evidence past a certain point, we must act on pure theory because that’s the best possible knowledge we can hope for before things become deadly;
dvd is not convinced of this thinking. Arguably, they’re right—while EY’s argument has weight I would consider it far from certain, and mostly seems built around the assumption of ASI-as-singleton rather than, say, an ecosystem of evolving AIs in competition which may have to worry also about each other and a closing window of opportunity;
if warning shots are possible, a lot of EY’s arguments don’t hold as straightforwardly. It becomes less reasonable to take extreme actions on pure speculation because we can afford—however with risk—to wait for a first sign of experimental evidence that the risk is real before going all in and risking paying the costs for nothing.
This is not irrelevant or unrelated IMO. I still think the risk is large but obviously warning shots would change the scenario and the way we approach and evaluate the risks of superintelligence.
You are importantly sliding from one point to another, and this is not a topic where you can afford to do that. You can’t just tally up the markers that sort of vibe towards “how dangerous is it?” and get an answer about what to do. The arguments are individually true, or false, and what sort of world we live in depends on which specific combination of arguments are true, or false.
If it turns out there is no political will for a shut down or controlled takeoff, then we can’t have a shut down or controlled takeoff. (But that doesn’t change whether AI is likely to FOOM, or whether alignment is easy/hard)
If AI Fooms suddenly, a lot of AI alignment techniques will probably break at once. If things are gradual, smaller things may break 1-2 at a time, and maybe we get warning shots, and this buys us time. But, there’s still the question of what to do with that time.
If alignment is easy, then a reasonable plan is “get everyone to slow down for a couple years so we can do the obvious safety things, just less rushed.” If alignment is hard, that won’t work, you actually need a radically different paradigm of AI development to have any chance of not killing everyone – you may need a lot of time to figure out something new.
None of IABIED’s arguments had to do with “are warning shots possible?”, but even if they did, it is a logical fallacy to say “warning shots are possible, EY arguments arguments are less valid, therefore, this other argument that had nothing to do with warning shots is also invalid.” If you’re doing that kind of sloppy reasoning, then if you get to the warning shot world, if you don’t understand that overwhelmingly powerful superintelligence is qualitatively different from non-overwhelmingly powerful superintelligence, you might think “angle for a 1-2 year slowdown” instead of trying for a longer global moratorium.
(But, repeat, the book doesn’t say anything about whether warning shots)