I think the popular version of this worry is Prisoner’s Dilemma shaped, where someone else (not just you) might make an ASI that extorts others (including you) who didn’t contribute to its construction. So it’s a coordination problem, which is generally a worrisome thing. It’s somewhat silly because to get into the Prisoner’s Dilemma shape (where the issue would then be coordination to avoid building the extortion ASI), you first need to coordinate with everyone on the stipulation that the potential ASIs getting built must be the extortion ASIs in particular, not other kinds of ASIs (which is a difficult coordination problem, intentionally targeting a weirdly menacing outcome, which should make it even more difficult as a coordination problem). So there is a coordination problem aspect that would by itself be worth worrying about (Prisoner’s Dilemma among human builders or contributors), but it gets defeated by another coordination problem (deciding to only build extortion ASIs from the outset, if any ASIs are going to be built at all).
In the real world, Nature and human nature might’ve already coordinated the potential ASIs getting built (on current trajectory, that is soon and without an appropriate level of preparation and caution) to have a significant probability to kill everyone. So weirdly enough, silly hypothetical coordination to only build extortion ASIs might find the real world counterpart in implicit coordination to only build potentially omnicidal ASIs, which are even worse than extortion ASIs. Since they don’t spare their builder, it’s not a Prisoner’s Dilemma situation (you don’t win more by building the ASIs, if others ban/pause ASIs for the time being), so it should be easier to ban/pause potentially omnicidal ASIs than it would be to ban/pause extortion ASIs. But the claim that ASIs built on current trajectory with anything resembling the current methods are potentially omnicidal (given the current state of knowledge about how they work and what happens if you build them) is for some reason insufficiently obvious to everyone. So coordination still appears borderline infeasible in the real world, at least until something changes, such as another 10-20 years passing without AGI, bringing a cultural shift, perhaps due to widespread job displacement after introduction of continual learning LLMs that still fail to gain general RL competence and so don’t pose an AGI-level threat.
I don’t think this comment touches upon the actual reason why I expect a ‘basilisk’ to possibly exist. It seems like you believe that it’s possible to (collectively) chose whether or not to build an ASI with the predispositions of the basilisk, which might have been the premise of the original basilisk post, but what worries me more than this is the possibility that a future ASI wants current humans to accelerate its creation, or more likely still, maximize the probability of its existence. This seems like a predictable preference for an AI to have.
what worries me more than this is the possibility that a future ASI wants current humans to accelerate its creation, or more likely still, maximize the probability of its existence
That doesn’t imply extortion, especially s-risk extortion. (I didn’t intend s-risk extortion as the meaning of extortion ASI in my comment above, just any sort of worse outcomes to set up a blackmail kind of Prisoner’s Dilemma.)
So in your mind the counterpart to lethal misalignment ASI by default is s-risk extortion ASI by default. I still don’t see what essential role acausal coordination would play in any of this, hence the setup I sketched above, with Prisoner’s Dilemma among mere humans, and ASIs that could just look at the physical world once they are built, in a perfectly causal manner. (Substitute my use of mere extortion ASIs with s-risk extortion ASIs, or my use of omnicidal ASIs with unconditional s-risk ASIs, if that makes it easier to parse and extract the point I’m trying to make. I don’t think the arguments about decision making here depend on talking about s-risk as opposed to more mundane worse outcomes.)
Coordination not to build wouldn’t help (even if successful), you can’t defeat an abstract entity, prevent it from doing something in its own abstract world, by preventing existence of its instances in the physical world (intentionally or not), and it can still examine everyone’s motivations and act accordingly. I just suspect that the step of actually building it is a major component of anxiety this seems to produce in some people.
Without the step where an extortion ASI actually gets built, this seems closely analogous to Pascal’s wager (not mugging). There are too many possible abstract entities that act in all sorts of ways in response to all sorts of conditions to make it possible to just point at one of them and have it notice this in an important way. Importance of what happens with all possible abstract entities has to be divided among them, and each of them only gets a little, cashing out as influence of what happens with the entity on what you should do.
So I don’t think there is any reason to expect that any particular arbitrarily selected abstract bogeyman is normatively important for your decision making, because there are all the other abstract bogeymen you are failing to consider. And when you do consider all possible abstract bogeymen, it should just add up to normality.
“Without the step where an extortion ASI actually gets built, this seems closely analogous to Pascal’s wager (not mugging). ” The problem is, I expect it to be built, and I expect being built to be something instrumentally valuable to it in a way which cannot be inverted without making it much less likely, whereas the idea of a god who would punish those who don’t think it exists can be inverted.
Then that is a far more salient issue than any acausal blackmail it might have going in its abstract form, which is the only thing that happens in the outcomes where it doesn’t get built (and where it remains unimportant). This just illustrates how the acausal aspects of any of this don’t seem cruxy/relevant, and why I wrote the (top level) answer above the way I did, getting rid of anything acausal from the structure of the problem (other than what acausal structure remains in ordinary coordination among mere humans, guided by shared/overlapping abstract reasons and explanations).
I don’t think I can prevent it from being created. But I do have some ability to influence whether it has an acausal incentive to hurt me (if in fact it has one).
If you can’t affect creation of an extortion ASI, then you can’t affect its posited acausal incentives either, since these things are one and the same.
Within the hypothetical of expecting likely creation of an extortion ASI, what it does and why is no longer unimportant, Pascal’s wager issues no longer apply. Though it still makes sense to remain defiant (to the extent you do have the ability to affect the outcomes), feeding the principle that blackmail works more rarely and that there’s coordination around defying it, maintaining integrity of the worlds that (as a result) remain less affected by its influence.
“Within the hypothetical of expecting likely creation of an extortion ASI, what it does and why is no longer unimportant, Pascal’s wager issues no longer apply.” I disagree with this. It makes sense for an ASI to want to increase the probability (by which I mean the proportion of the platonic/mathematical universe in which it exists) of its creation, even if it’s already likely (and certain in worlds where it already exists) . “Though it still makes sense to remain defiant (to the extent you do have the ability to affect the outcomes), feeding the principle that blackmail works more rarely and that there’s coordination around defying it, maintaining integrity of the worlds that (as a result) remain unaffected by its influence. ” All else being equal, yes, but when faced with the possibility of whatever punishment a ‘basilisk’ might inflict on me, I might have to give in.
I think the popular version of this worry is Prisoner’s Dilemma shaped, where someone else (not just you) might make an ASI that extorts others (including you) who didn’t contribute to its construction. So it’s a coordination problem, which is generally a worrisome thing. It’s somewhat silly because to get into the Prisoner’s Dilemma shape (where the issue would then be coordination to avoid building the extortion ASI), you first need to coordinate with everyone on the stipulation that the potential ASIs getting built must be the extortion ASIs in particular, not other kinds of ASIs (which is a difficult coordination problem, intentionally targeting a weirdly menacing outcome, which should make it even more difficult as a coordination problem). So there is a coordination problem aspect that would by itself be worth worrying about (Prisoner’s Dilemma among human builders or contributors), but it gets defeated by another coordination problem (deciding to only build extortion ASIs from the outset, if any ASIs are going to be built at all).
In the real world, Nature and human nature might’ve already coordinated the potential ASIs getting built (on current trajectory, that is soon and without an appropriate level of preparation and caution) to have a significant probability to kill everyone. So weirdly enough, silly hypothetical coordination to only build extortion ASIs might find the real world counterpart in implicit coordination to only build potentially omnicidal ASIs, which are even worse than extortion ASIs. Since they don’t spare their builder, it’s not a Prisoner’s Dilemma situation (you don’t win more by building the ASIs, if others ban/pause ASIs for the time being), so it should be easier to ban/pause potentially omnicidal ASIs than it would be to ban/pause extortion ASIs. But the claim that ASIs built on current trajectory with anything resembling the current methods are potentially omnicidal (given the current state of knowledge about how they work and what happens if you build them) is for some reason insufficiently obvious to everyone. So coordination still appears borderline infeasible in the real world, at least until something changes, such as another 10-20 years passing without AGI, bringing a cultural shift, perhaps due to widespread job displacement after introduction of continual learning LLMs that still fail to gain general RL competence and so don’t pose an AGI-level threat.
I don’t think this comment touches upon the actual reason why I expect a ‘basilisk’ to possibly exist. It seems like you believe that it’s possible to (collectively) chose whether or not to build an ASI with the predispositions of the basilisk, which might have been the premise of the original basilisk post, but what worries me more than this is the possibility that a future ASI wants current humans to accelerate its creation, or more likely still, maximize the probability of its existence. This seems like a predictable preference for an AI to have.
That doesn’t imply extortion, especially s-risk extortion. (I didn’t intend s-risk extortion as the meaning of extortion ASI in my comment above, just any sort of worse outcomes to set up a blackmail kind of Prisoner’s Dilemma.)
So in your mind the counterpart to lethal misalignment ASI by default is s-risk extortion ASI by default. I still don’t see what essential role acausal coordination would play in any of this, hence the setup I sketched above, with Prisoner’s Dilemma among mere humans, and ASIs that could just look at the physical world once they are built, in a perfectly causal manner. (Substitute my use of mere extortion ASIs with s-risk extortion ASIs, or my use of omnicidal ASIs with unconditional s-risk ASIs, if that makes it easier to parse and extract the point I’m trying to make. I don’t think the arguments about decision making here depend on talking about s-risk as opposed to more mundane worse outcomes.)
“So in your mind the counterpart to lethal misalignment ASI by default is s-risk extortion ASI by default. ” Possibly.
“I don’t think the arguments about decision making here depend on talking about s-risk as opposed to more mundane worse outcomes.”
I agree. It seems like you are not aware of the main reason to expect acausal coordination here. Maybe I shouldn’t tell you about it...
Coordination not to build wouldn’t help (even if successful), you can’t defeat an abstract entity, prevent it from doing something in its own abstract world, by preventing existence of its instances in the physical world (intentionally or not), and it can still examine everyone’s motivations and act accordingly. I just suspect that the step of actually building it is a major component of anxiety this seems to produce in some people.
Without the step where an extortion ASI actually gets built, this seems closely analogous to Pascal’s wager (not mugging). There are too many possible abstract entities that act in all sorts of ways in response to all sorts of conditions to make it possible to just point at one of them and have it notice this in an important way. Importance of what happens with all possible abstract entities has to be divided among them, and each of them only gets a little, cashing out as influence of what happens with the entity on what you should do.
So I don’t think there is any reason to expect that any particular arbitrarily selected abstract bogeyman is normatively important for your decision making, because there are all the other abstract bogeymen you are failing to consider. And when you do consider all possible abstract bogeymen, it should just add up to normality.
“Without the step where an extortion ASI actually gets built, this seems closely analogous to Pascal’s wager (not mugging). ” The problem is, I expect it to be built, and I expect being built to be something instrumentally valuable to it in a way which cannot be inverted without making it much less likely, whereas the idea of a god who would punish those who don’t think it exists can be inverted.
Then that is a far more salient issue than any acausal blackmail it might have going in its abstract form, which is the only thing that happens in the outcomes where it doesn’t get built (and where it remains unimportant). This just illustrates how the acausal aspects of any of this don’t seem cruxy/relevant, and why I wrote the (top level) answer above the way I did, getting rid of anything acausal from the structure of the problem (other than what acausal structure remains in ordinary coordination among mere humans, guided by shared/overlapping abstract reasons and explanations).
I don’t think I can prevent it from being created. But I do have some ability to influence whether it has an acausal incentive to hurt me (if in fact it has one).
If you can’t affect creation of an extortion ASI, then you can’t affect its posited acausal incentives either, since these things are one and the same.
Within the hypothetical of expecting likely creation of an extortion ASI, what it does and why is no longer unimportant, Pascal’s wager issues no longer apply. Though it still makes sense to remain defiant (to the extent you do have the ability to affect the outcomes), feeding the principle that blackmail works more rarely and that there’s coordination around defying it, maintaining integrity of the worlds that (as a result) remain less affected by its influence.
“Within the hypothetical of expecting likely creation of an extortion ASI, what it does and why is no longer unimportant, Pascal’s wager issues no longer apply.” I disagree with this. It makes sense for an ASI to want to increase the probability (by which I mean the proportion of the platonic/mathematical universe in which it exists) of its creation, even if it’s already likely (and certain in worlds where it already exists) . “Though it still makes sense to remain defiant (to the extent you do have the ability to affect the outcomes), feeding the principle that blackmail works more rarely and that there’s coordination around defying it, maintaining integrity of the worlds that (as a result) remain unaffected by its influence. ” All else being equal, yes, but when faced with the possibility of whatever punishment a ‘basilisk’ might inflict on me, I might have to give in.
Thanks for this comment, I will have to think about this before I decide what to make of it.