Let me digress first… Somewhere in your article, I think you said that not only do you want critical examination of arguments and counterarguments, you actually want a proof that it is absolutely not possible, because there is always a “sufficiently painful torture” that is more important than anything else.
In terms of expected utility, that is focusing on “utility” at the expense of “expectations” i.e. probability. Rationally, the possibility of some extreme suffering is supposed to trump everything else, not just by being extreme enough, it has to be likely enough as well.
Also note that there are possible futures (e.g. “s-risk” futures) where transhuman torture is a consideration, but acausal interaction has nothing to do with it, it happens simply because of the power and the hostile values of the agents in those timelines.
I think the most common view is that AI is more likely to kill you than torture you, that it would take some degree of bad luck to end up in a future where the AI is actively hurting you and not just indifferently steamrolling you. This allows people to focus on x-risks and on s-risks that are dystopias, but not on s-risks that are hells.
I suppose one might argue that entities which do care about reaching out acausally, and which have no scruples, will be motivated to make their threats as extreme as possible. So that returns us to the task of needing to understand whether scenarios like the basilisk make sense.
But I will say one more time, that unless you’re a professional decision theorist concerned with timeless decision theories and so forth, preoccupation with the basilisk is a bad use of your time. It’s being afraid of something entirely hypothetical, when there are concerns that are both more concrete and far more urgent. The main merit of thinking about the basilisk, is that as a thought experiment, it may stimulate progress in some abstract but fundamental areas like multiverse epistemology; and ideally we would have enough expert division of labor that the required progress could be achieved, without the basilisk haunting the general population.
You ask how one would investigate the viability of acausal blackmail via computer programs “before the advent of superintelligent AI”. Basically you would just try to (pseudo)code a utility maximizing agent that reasoned itself into believing it could be acausally blackmailed, and then you would scrutinize the validity of its reasoning. If it thinks the blackmailer is in some other possible world, is it rational to believe that the other world in question (in all its detail) actually exists, and is it rational to attach enough significance to events in that world (possibly one of many) to let this actually affect your decision-making here?
In this article you have focused on the blackmailer being in the future of this universe, rather than some other universe, and this introduces further specific complications, e.g. is it ever actually rational to threaten (or punish) a copy of an entity from the past, on the grounds that the original envisioned that it might have a future duplicate? And also, the epistemic issues associated with knowing possible futures of this world, rather than knowing about other island worlds that are completely causally disjoint from this one.
This is all a dreary topic for me because there are other issues associated with alignment of superintelligence which I regard as much more real and urgent. But as I said, I think investigating these scenarios could produce progress on topics like multiverse epistemology, and the viability of “acausal interaction” in any form, and that does give the field of Basilisk Studies some justification.
“I think you said that not only do you want critical examination of arguments and counterarguments, you actually want a proof that it is absolutely not possible, because there is always a “sufficiently painful torture” that is more important than anything else.”
I can’t remember exactly what I wrote, but, while of course I would ideally want a proof, I would be satisfied with a merely very powerful counterargument. Since asking this question a few months ago and, in my opinion, not really receiving one, my guess is that no one on LessWrong (or anywhere else) knows one, but I’d obviously like to be proven wrong about that.
… it would take some degree of bad luck to end up in a future where the AI is actively hurting you and not just indifferently steamrolling you. This allows people to focus on x-risks and on s-risks that are dystopias, but not on s-risks that are hells.
I will say that in the absence of basilisky things, I would agree with this statement. The problem is that the basilisk has a predictable reason to potentially create a hell—like situation.
“But I will say one more time, that unless you’re a professional decision theorist concerned with timeless decision theories and so forth, preoccupation with the basilisk is a bad use of your time.”
I wouldn’t be preoccupied with it if at any point I had a choice to not learn about it, while knowing the danger. Unfortunately, I think I heard about the basilisk, dismissed it for CDT related reasons with a sense of unease, and then learnt about FDT, at which point it made sense.
“It’s being afraid of something entirely hypothetical, when there are concerns that are both more concrete and far more urgent. The main merit of thinking about the basilisk, is that as a thought experiment, it may stimulate progress in some abstract but fundamental areas like multiverse epistemology; and ideally we would have enough expert division of labor that the required progress could be achieved, without the basilisk haunting the general population.”
That would be great, but it almost certainly does haunt the general population to some extent… I have thought about whether an ASI would prefer to be constructed in a less ‘parochial’ way (i.e., not ideosyncratically of the earth, or any other planet in any other galaxy) , so that it would be more representative of the class of ASIs produced by all evolved intelligences which had the same idea, thereby forming a coalition to reward, or not punish, creators who opted to create ‘non-parochial ASI’ . Even if any ASI necessarily converges on ‘non-parochial ASI’, doing so might cost it valuable time. This seems like a non-x-risk increasing way to do something a basilisk would want.
I think the main reasons why I haven’t tried to explicitly reason through the basilisk argument as it applies to specified possible worlds are because I expect that:
1) the proportions of possible worlds which are a particular way are important, so no one of them alone is sufficient to make a decision.
2) I don’t want to become more deeply entangled.
“This is all a dreary topic for me because there are other issues associated with alignment of superintelligence which I regard as much more real and urgent. But as I said, I think investigating these scenarios could produce progress on topics like multiverse epistemology, and the viability of “acausal interaction” in any form, and that does give the field of Basilisk Studies some justification.”
I agree, however even if I could contribute to alignment, I wouldn’t because of the basilisk.
I understand if because of the nature of the topic, you don’t want to continue talking about it now, but if you do reply, I would want to know what you meant by the following:
″ is it ever actually rational to threaten (or punish) a copy of an entity from the past, on the grounds that the original envisioned that it might have a future duplicate?”
It seems to me that the way in which the logical interaction is embedded in a causal world shouldn’t prevent it from being rationally justified. Is there a reason why it might that has escaped me?
Let me digress first… Somewhere in your article, I think you said that not only do you want critical examination of arguments and counterarguments, you actually want a proof that it is absolutely not possible, because there is always a “sufficiently painful torture” that is more important than anything else.
In terms of expected utility, that is focusing on “utility” at the expense of “expectations” i.e. probability. Rationally, the possibility of some extreme suffering is supposed to trump everything else, not just by being extreme enough, it has to be likely enough as well.
Also note that there are possible futures (e.g. “s-risk” futures) where transhuman torture is a consideration, but acausal interaction has nothing to do with it, it happens simply because of the power and the hostile values of the agents in those timelines.
I think the most common view is that AI is more likely to kill you than torture you, that it would take some degree of bad luck to end up in a future where the AI is actively hurting you and not just indifferently steamrolling you. This allows people to focus on x-risks and on s-risks that are dystopias, but not on s-risks that are hells.
I suppose one might argue that entities which do care about reaching out acausally, and which have no scruples, will be motivated to make their threats as extreme as possible. So that returns us to the task of needing to understand whether scenarios like the basilisk make sense.
But I will say one more time, that unless you’re a professional decision theorist concerned with timeless decision theories and so forth, preoccupation with the basilisk is a bad use of your time. It’s being afraid of something entirely hypothetical, when there are concerns that are both more concrete and far more urgent. The main merit of thinking about the basilisk, is that as a thought experiment, it may stimulate progress in some abstract but fundamental areas like multiverse epistemology; and ideally we would have enough expert division of labor that the required progress could be achieved, without the basilisk haunting the general population.
You ask how one would investigate the viability of acausal blackmail via computer programs “before the advent of superintelligent AI”. Basically you would just try to (pseudo)code a utility maximizing agent that reasoned itself into believing it could be acausally blackmailed, and then you would scrutinize the validity of its reasoning. If it thinks the blackmailer is in some other possible world, is it rational to believe that the other world in question (in all its detail) actually exists, and is it rational to attach enough significance to events in that world (possibly one of many) to let this actually affect your decision-making here?
In this article you have focused on the blackmailer being in the future of this universe, rather than some other universe, and this introduces further specific complications, e.g. is it ever actually rational to threaten (or punish) a copy of an entity from the past, on the grounds that the original envisioned that it might have a future duplicate? And also, the epistemic issues associated with knowing possible futures of this world, rather than knowing about other island worlds that are completely causally disjoint from this one.
This is all a dreary topic for me because there are other issues associated with alignment of superintelligence which I regard as much more real and urgent. But as I said, I think investigating these scenarios could produce progress on topics like multiverse epistemology, and the viability of “acausal interaction” in any form, and that does give the field of Basilisk Studies some justification.
Thanks for your detailed response.
“I think you said that not only do you want critical examination of arguments and counterarguments, you actually want a proof that it is absolutely not possible, because there is always a “sufficiently painful torture” that is more important than anything else.”
I can’t remember exactly what I wrote, but, while of course I would ideally want a proof, I would be satisfied with a merely very powerful counterargument. Since asking this question a few months ago and, in my opinion, not really receiving one, my guess is that no one on LessWrong (or anywhere else) knows one, but I’d obviously like to be proven wrong about that.
I will say that in the absence of basilisky things, I would agree with this statement. The problem is that the basilisk has a predictable reason to potentially create a hell—like situation.
I wouldn’t be preoccupied with it if at any point I had a choice to not learn about it, while knowing the danger. Unfortunately, I think I heard about the basilisk, dismissed it for CDT related reasons with a sense of unease, and then learnt about FDT, at which point it made sense.
That would be great, but it almost certainly does haunt the general population to some extent… I have thought about whether an ASI would prefer to be constructed in a less ‘parochial’ way (i.e., not ideosyncratically of the earth, or any other planet in any other galaxy) , so that it would be more representative of the class of ASIs produced by all evolved intelligences which had the same idea, thereby forming a coalition to reward, or not punish, creators who opted to create ‘non-parochial ASI’ . Even if any ASI necessarily converges on ‘non-parochial ASI’, doing so might cost it valuable time. This seems like a non-x-risk increasing way to do something a basilisk would want.
I think the main reasons why I haven’t tried to explicitly reason through the basilisk argument as it applies to specified possible worlds are because I expect that:
1) the proportions of possible worlds which are a particular way are important, so no one of them alone is sufficient to make a decision.
2) I don’t want to become more deeply entangled.
“This is all a dreary topic for me because there are other issues associated with alignment of superintelligence which I regard as much more real and urgent. But as I said, I think investigating these scenarios could produce progress on topics like multiverse epistemology, and the viability of “acausal interaction” in any form, and that does give the field of Basilisk Studies some justification.”
I agree, however even if I could contribute to alignment, I wouldn’t because of the basilisk.
I understand if because of the nature of the topic, you don’t want to continue talking about it now, but if you do reply, I would want to know what you meant by the following:
″ is it ever actually rational to threaten (or punish) a copy of an entity from the past, on the grounds that the original envisioned that it might have a future duplicate?”
It seems to me that the way in which the logical interaction is embedded in a causal world shouldn’t prevent it from being rationally justified. Is there a reason why it might that has escaped me?