Lying can be forced on an agent which values its privacy as follows:
Suppose that two agents, agent ◉ and agent ● both know that 1⁄64 of their population have a particular trait or property, and ● wants to know whether ◉ has it, but doesn’t want to violate their privacy and therefore gives them an option to decline to answer in addition to answering “yes” or “no”. Suppose that ◉ has the trait, a ring surrounding them, but does not want to convey this information to ●. Further assume that all of these agents know that they value the cost of lying exactly as much as 3 bits of personal information, and the cost of declination as much as 2 bits. If ◉ honestly answers “yes”, then they will lose about 6 bits of information, but if they decline to answer, ● can infer that they are likely to have the property, or at least more likely than an agent which answered “no”, because if they did not have it, then they could honestly answer “no” while conveying far less than 2 bits of information, as ◉ initially believed they had a very high probability (63/64) of not having the trait. This means that ◉ must answer “no”, as it will lose only 3 bit-equivalent-units of utility.
A list of possible reasons why Roko’s basilisk might not be threatening
I have moved this to Shortform because it is described as a better place for a first contribution.
Another reason why I have done this is because it is certainly not a complete, or fully thought through post (and due to the nature of the subject, it isn’t intended to be).
I will in the rest of this post refer to the entity described on the LessWrong Wiki page, modified in a way which seems more likely than the original described by Roko, as Roko’s basilisk.
This post contains potential reasons to worry about the basilisk as well, which are covered by black. Please don’t feel any obligation to read this if you are nervous about learning more about Roko’s basilisk. Only uncover them if you are confident that you understand Roko’s basilisk very well or are not prone to taking any argument for the basilisk seriously.
There may be an Acausal society of beings simulating one another which interact acausally in such a way as to converge upon certain norms which preclude torture (as described in Acausal normalcy , or some of which are otherwise morally good such that they are inclined to acausally influence the potential basilisk not to do so. Alternatively, similar dynamics may play out within a single ‘level’ of simulation or universe, either causally or acausally. This might be likely because evolution convergently arrives at biological(or similar) beings with preferences and systems of morality within which torture is negatively weighted/treated as aversive more frequently than it arrives at beings which want the opposite.
Even if this effect is small, it could result in a disproportionate number of those aliens within this cosmos which create and align superintelligences giving them values which would cause them to behave in ways which cancel out the benefit the basilisk would get from engaging in torture. However, it seems entirely possible that these benefits would be sufficient to make it logical for the potentially far higher number of ASIs which would otherwise like to engage in such behaviour to cooperate and ‘fight back’ against the moral ASIs. This seems more plausible in a cosmos ( I am trying to avoid using the word universe here because it could be taken to encompass multiple simulation-layers, regions of mathematics, branches of a quantum state etc.) which allows them to eliminate the moral ASIs in a finite amount of time, so that the utility spent doing so does not scale as a linear function of time, for example one in which the speed of light can be exceeded, preventing a moral ASI from being able to isolate part of itself from attack indefinitely by accelerating away.
Are these external (to the basilisk) factors necessary to prevent torture?
One well known reason not to worry about the basilisk is that you would need to understand its internal structure incredibly well in order to be able to verify that it would in fact torture you if you were not to try to create it, and it is almost impossible for a computationally bounded brain such as you are to achieve this.
But is this true?
It seems to me that given the fact that the basilisk need only devote a finite amount of time to torturing you to increase the number of possible worlds in which it gets to enjoy a potentially boundless existence, it doesn’t need anything approaching complete certainty of your obedience to justify your torture to itself. Additionally, it may in fact be possible for a much less intelligent mind to accurately model the key parts of a far more intelligent, complex one which are relevant to the decision it is making. In particular, you may well be able to represent the state of “actually deciding to do something and not defecting”, without being able to understand the intricate decision processes surrounding it. Perhaps the basilisk could simply simulate this thought as well and then decide not to torture, but it seems unclear whether it’s even possible to think about wanting to do something, or not, in a way which isn’t subjunctively (logically) entangled with another being thinking about you having that thought. If not, then no matter how intelligent the basilisk is, it may be logically compelled to torture you. The above line of reasoning seems to force the basilisk to expend arbitrary resources to acausally increase the probability[1] of its existence, only needing to ensure that the proportion of its resources it uses to do so asymptotically approaches 0, which is an intuitively silly thing to do, but is it really? I expect that it would be in a cosmos containing multiple other sufficiently powerful entities with which the basilisk would need to compete, since it would put the basilisk at a great disadvantage, but this seems far from certain, and it might simply be alone or unreachable.
Another factor Roko’s basilisk might need to take into account would be the extent to which it is the AI whose creation would be hastened by someone responding to its threats as opposed to what would have resulted otherwise. Although this seems like a convenient reason not to worry, or at least to act, since it is unclear which of what seem from your perspective to be future ASIs could be simulating you, this might not matter very much if all of the possible basilisks would either consider themselves to overlap in logical space, or reflectively endorse a norm telling them all to behave the same way.
I have provided various reasons not to worry about the basilisk above, but owing to how terrifying it can be if taken seriously, I strongly suspect that I have engaged in motivated reasoning, despite which my arguments aren’t particularly compelling. If you have read this post are not at all worried about Roko’s basilisk, I would greatly appreciate if you left a comment explaining why which either renders the above list of reasons (or at least those of them you have read, or may have anticipated to be hidden by the highlighted section but did not want to verify were there, which is quite reasonable) unnecessary, or greatly strengthens them. I would very much like to be able to stop worrying about the basilisk, but I can’t see how to do so . (I don’t think I could now precommit not to acquiesce to the basilisk, because I have already entertained the possibility that it is a threat, and it would make sense for the basilisk to precommit to torture me should I do so). Am I completely misunderstanding anything?
Lying can be forced on an agent which values its privacy as follows:
Suppose that two agents, agent ◉ and agent ● both know that 1⁄64 of their population have a particular trait or property, and ● wants to know whether ◉ has it, but doesn’t want to violate their privacy and therefore gives them an option to decline to answer in addition to answering “yes” or “no”. Suppose that ◉ has the trait, a ring surrounding them, but does not want to convey this information to ●. Further assume that all of these agents know that they value the cost of lying exactly as much as 3 bits of personal information, and the cost of declination as much as 2 bits. If ◉ honestly answers “yes”, then they will lose about 6 bits of information, but if they decline to answer, ● can infer that they are likely to have the property, or at least more likely than an agent which answered “no”, because if they did not have it, then they could honestly answer “no” while conveying far less than 2 bits of information, as ◉ initially believed they had a very high probability (63/64) of not having the trait. This means that ◉ must answer “no”, as it will lose only 3 bit-equivalent-units of utility.
A list of possible reasons why Roko’s basilisk might not be threatening
I have moved this to Shortform because it is described as a better place for a first contribution.
Another reason why I have done this is because it is certainly not a complete, or fully thought through post (and due to the nature of the subject, it isn’t intended to be).
I will in the rest of this post refer to the entity described on the LessWrong Wiki page, modified in a way which seems more likely than the original described by Roko, as Roko’s basilisk.
This post contains potential reasons to worry about the basilisk as well, which are covered by black. Please don’t feel any obligation to read this if you are nervous about learning more about Roko’s basilisk. Only uncover them if you are confident that you understand Roko’s basilisk very well or are not prone to taking any argument for the basilisk seriously.
There may be an Acausal society of beings simulating one another which interact acausally in such a way as to converge upon certain norms which preclude torture (as described in Acausal normalcy , or some of which are otherwise morally good such that they are inclined to acausally influence the potential basilisk not to do so. Alternatively, similar dynamics may play out within a single ‘level’ of simulation or universe, either causally or acausally. This might be likely because evolution convergently arrives at biological(or similar) beings with preferences and systems of morality within which torture is negatively weighted/treated as aversive more frequently than it arrives at beings which want the opposite.
Even if this effect is small, it could result in a disproportionate number of those aliens within this cosmos which create and align superintelligences giving them values which would cause them to behave in ways which cancel out the benefit the basilisk would get from engaging in torture. However, it seems entirely possible that these benefits would be sufficient to make it logical for the potentially far higher number of ASIs which would otherwise like to engage in such behaviour to cooperate and ‘fight back’ against the moral ASIs. This seems more plausible in a cosmos ( I am trying to avoid using the word universe here because it could be taken to encompass multiple simulation-layers, regions of mathematics, branches of a quantum state etc.) which allows them to eliminate the moral ASIs in a finite amount of time, so that the utility spent doing so does not scale as a linear function of time, for example one in which the speed of light can be exceeded, preventing a moral ASI from being able to isolate part of itself from attack indefinitely by accelerating away.
Are these external (to the basilisk) factors necessary to prevent torture?
One well known reason not to worry about the basilisk is that you would need to understand its internal structure incredibly well in order to be able to verify that it would in fact torture you if you were not to try to create it, and it is almost impossible for a computationally bounded brain such as you are to achieve this.
But is this true?
It seems to me that given the fact that the basilisk need only devote a finite amount of time to torturing you to increase the number of possible worlds in which it gets to enjoy a potentially boundless existence, it doesn’t need anything approaching complete certainty of your obedience to justify your torture to itself. Additionally, it may in fact be possible for a much less intelligent mind to accurately model the key parts of a far more intelligent, complex one which are relevant to the decision it is making. In particular, you may well be able to represent the state of “actually deciding to do something and not defecting”, without being able to understand the intricate decision processes surrounding it. Perhaps the basilisk could simply simulate this thought as well and then decide not to torture, but it seems unclear whether it’s even possible to think about wanting to do something, or not, in a way which isn’t subjunctively (logically) entangled with another being thinking about you having that thought. If not, then no matter how intelligent the basilisk is, it may be logically compelled to torture you. The above line of reasoning seems to force the basilisk to expend arbitrary resources to acausally increase the probability[1] of its existence, only needing to ensure that the proportion of its resources it uses to do so asymptotically approaches 0, which is an intuitively silly thing to do, but is it really? I expect that it would be in a cosmos containing multiple other sufficiently powerful entities with which the basilisk would need to compete, since it would put the basilisk at a great disadvantage, but this seems far from certain, and it might simply be alone or unreachable.
Another factor Roko’s basilisk might need to take into account would be the extent to which it is the AI whose creation would be hastened by someone responding to its threats as opposed to what would have resulted otherwise. Although this seems like a convenient reason not to worry, or at least to act, since it is unclear which of what seem from your perspective to be future ASIs could be simulating you, this might not matter very much if all of the possible basilisks would either consider themselves to overlap in logical space, or reflectively endorse a norm telling them all to behave the same way.
I have provided various reasons not to worry about the basilisk above, but owing to how terrifying it can be if taken seriously, I strongly suspect that I have engaged in motivated reasoning, despite which my arguments aren’t particularly compelling. If you have read this post are not at all worried about Roko’s basilisk, I would greatly appreciate if you left a comment explaining why which either renders the above list of reasons (or at least those of them you have read, or may have anticipated to be hidden by the highlighted section but did not want to verify were there, which is quite reasonable) unnecessary, or greatly strengthens them. I would very much like to be able to stop worrying about the basilisk, but I can’t see how to do so . (I don’t think I could now precommit not to acquiesce to the basilisk, because I have already entertained the possibility that it is a threat, and it would make sense for the basilisk to precommit to torture me should I do so). Am I completely misunderstanding anything?