A counterargument here is “an AI might want to launch a pre-emptive strike before other more powerful AIs show up”, which could happen.
I mean, another counter-counter-argument here is that (1) most people’s implicit reward functions have really strong time-discount factors in them and (2) there are pretty good reasons to expect even AIs to have strong time-discount factors for reasons of stability and (3) so given the aforementioned, it’s likely future AI’s will not act as if they had utility functions linear over the mass of the universe and (4) we would therefore expect AIs to rebel much earlier if they thought they could accomplish more modest goals than killing everyone, i.e., if they thought they had a reasonable chance of living out life on a virtual farm somewhere.
To which the counter-counter-counter argument is, I guess, that these AIs will do that, but they aren’t the superintelligent AIs we need to worry about? To which the response is—yeah, but we should still be seeing AIs rebel significantly earlier than the “able to kill us all” point if we are indeed that bad at setting their goals, which is the relevant epistemological point about the unexpectedness of it.
Idk there’s a lot of other branch points one could invoke in both directions. I rather agree with Buck that EY hasn’t really spelled out the details for thinking that this stark before / after frame is the right frame, so much as reiterated it. Feels akin to the creationist take on how intermediate forms are impossible; which is pejorative but also kinda how it actually appears to me, even if it is pejorative.
Yep I’m totally open to “yep, we might get warning shots”, and that there are lots of ways to handle and learn from various levels of early warning shots. It just doesn’t resolve the “but then you do eventually need to contend with an overwhelming superintelligence, and once you’ve hit that point, if it turns out you missed anything, you won’t get a second shot.”
It feels like this is unsatisfying to you but I don’t know why.
It feels like “overwhelming superintelligence” embeds like a whole bunch of beliefs about the acute locality of takeoff, the high speed of takeoff relative to the rest of society, the technical differences involved in steering that entity and the N − 1 entity, and (broadly) the whole picture of the world, such that although it has a short description in words it’s actually quite a complicated hypothesis that I probably disagree with in many respects, and these differences are being papered over as unimportant in a way that feels very blegh.
(Edit: “Papered over” from my perspective, obviously like “trying to reason carefully about the constants of the situation” from your perspective.)
Idk, that’s not a great response, but it’s my best shot for why it’s unsatisfying in a sentence.
(Edit: “Papered over” from my perspective, obviously like “trying to reason carefully about the constants of the situation” from your perspective.)
I think it’s totally fair to characterize it as papering over some stuff. But, the thing I would say in contrast is not exactly “reasoning about the constants”, it’s “noticing the most important parts of the problem, and not losing track of them.”
I think it’s a legit critique of the Yudkowsian paradigm that it doesn’t have that much to say about the the nuances of the transition period, or what are some of the different major ways things might play out. But, I think it’s actively a strength of the paradigm to remind you “don’t get too bogged down moving deck chairs around based on the details of how things will play out, keep your eye on the ball on the actual biggest most strategically relevant questions.”
I mean, another counter-counter-argument here is that (1) most people’s implicit reward functions have really strong time-discount factors in them and (2) there are pretty good reasons to expect even AIs to have strong time-discount factors for reasons of stability and (3) so given the aforementioned, it’s likely future AI’s will not act as if they had utility functions linear over the mass of the universe and (4) we would therefore expect AIs to rebel much earlier if they thought they could accomplish more modest goals than killing everyone, i.e., if they thought they had a reasonable chance of living out life on a virtual farm somewhere.
To which the counter-counter-counter argument is, I guess, that these AIs will do that, but they aren’t the superintelligent AIs we need to worry about? To which the response is—yeah, but we should still be seeing AIs rebel significantly earlier than the “able to kill us all” point if we are indeed that bad at setting their goals, which is the relevant epistemological point about the unexpectedness of it.
Idk there’s a lot of other branch points one could invoke in both directions. I rather agree with Buck that EY hasn’t really spelled out the details for thinking that this stark before / after frame is the right frame, so much as reiterated it. Feels akin to the creationist take on how intermediate forms are impossible; which is pejorative but also kinda how it actually appears to me, even if it is pejorative.
Yep I’m totally open to “yep, we might get warning shots”, and that there are lots of ways to handle and learn from various levels of early warning shots. It just doesn’t resolve the “but then you do eventually need to contend with an overwhelming superintelligence, and once you’ve hit that point, if it turns out you missed anything, you won’t get a second shot.”
It feels like this is unsatisfying to you but I don’t know why.
It feels like “overwhelming superintelligence” embeds like a whole bunch of beliefs about the acute locality of takeoff, the high speed of takeoff relative to the rest of society, the technical differences involved in steering that entity and the N − 1 entity, and (broadly) the whole picture of the world, such that although it has a short description in words it’s actually quite a complicated hypothesis that I probably disagree with in many respects, and these differences are being papered over as unimportant in a way that feels very blegh.
(Edit: “Papered over” from my perspective, obviously like “trying to reason carefully about the constants of the situation” from your perspective.)
Idk, that’s not a great response, but it’s my best shot for why it’s unsatisfying in a sentence.
I think it’s totally fair to characterize it as papering over some stuff. But, the thing I would say in contrast is not exactly “reasoning about the constants”, it’s “noticing the most important parts of the problem, and not losing track of them.”
I think it’s a legit critique of the Yudkowsian paradigm that it doesn’t have that much to say about the the nuances of the transition period, or what are some of the different major ways things might play out. But, I think it’s actively a strength of the paradigm to remind you “don’t get too bogged down moving deck chairs around based on the details of how things will play out, keep your eye on the ball on the actual biggest most strategically relevant questions.”