◉
Horosphere
Lying can be forced on an agent which values its privacy as follows:
Suppose that two agents, agent ◉ and agent ● both know that 1⁄64 of their population have a particular trait or property, and ● wants to know whether ◉ has it, but doesn’t want to violate their privacy and therefore gives them an option to decline to answer in addition to answering “yes” or “no”. Suppose that ◉ has the trait, a ring surrounding them, but does not want to convey this information to ●. Further assume that all of these agents know that they value the cost of lying exactly as much as 3 bits of personal information, and the cost of declination as much as 2 bits. If ◉ honestly answers “yes”, then they will lose about 6 bits of information, but if they decline to answer, ● can infer that they are likely to have the property, or at least more likely than an agent which answered “no”, because if they did not have it, then they could honestly answer “no” while conveying far less than 2 bits of information, as ◉ initially believed they had a very high probability (63/64) of not having the trait. This means that ◉ must answer “no”, as it will lose only 3 bit-equivalent-units of utility.
While an LLM can respond “Artificially Intelligently”, which includes processes humans can’t currently perform, at least to the same degree. If your definition of testimony includes an aspect of humanity, then the claim that only humans are capable of producing it is almost a tautology.
It might be informative to note that the ratio of the square of a circle’s area to its radius, or of its circumference to its diameter, is only necessarily what it is in Euclidean space because of certain axioms and postulates; in Hyperbolic space it is variable. This means it is logically possible for the number defined in this way to vary but with implications concerning other properties of geometry. In this sense, a (large) disagreement about
the value of π could be cordoned off by information about the curvature of spacetime. On the other hand, if you were to define π in a way which included all of the axioms and postulates of Euclidean geometry then imagining it to have a different value would become logically impossible and I don’t know how to make a decision here beyond deriving a contradiction. If you actually believed this but were otherwise logical, you could believe anything because of the principle of explosion.When I attempt to imagine myself operating in a universe in which π has slightly different digits, and conclude it wouldn’t affect my decisions , beliefs or behaviors very much, I think what I am doing is not actually conditioning on π having those digits directly, but rather making the observation that my model of the universe is of a kind of ‘equivalence class’ of all the different universes in which the first few digits of π are those which I know, which I can infer has certain properties which all of them share from the knowledge that
π is between 3 and 4. As a mind, I consider myself to be living in this ‘equivalence class’ of universes in logical space when I can’t distinguish between them by observation. If I had to consider a situation in which π(in the Euclidean sense) was equal to 2, I would necessarily “expand” in mathematical/logical space to a much broader type of universe where my system of axioms would not be sufficiently powerful to prove theorems like that
3<π<4, which would be a weirdly constrained place to live (and of course I can’t actually do this).Conversely, if I was a much more computationally powerful and intelligent but similarly inflexible mind, I wouldn’t be able to imagine even the 2^2^2nd digit of π being anything other than what it is.
The fact that LLMs could be conscious makes this potentially seriously immoral behaviour. What if they don’t agree with the point or sentiment you’re trying to convey?
If knowing that the source of a particular text is not human means it isn’t an assertion (or makes it devoid of propositional content) , then presumably not knowing whether it is of human origin or not should have the same effect, as is the case when a human (deliberately or otherwise) types/writes like an AI. But, I would argue, this is obviously not true because almost any argument or point a human makes can be formatted to appear AI generated.
- ^
I had written a long comment in which I pretended to possibly be an AI to make my point, but I decided not to post most of it to avoid ambiguity. Eventually, I converged on a more precise argument, which is the block of text above. (I added this to give context concerning my “human chain of thought”, which evolved while I was writing the comment, rather like a Large Language Model.)
- ^
“Why was Klurl assuming humans could solve nuclear engineering but not that evolution could?”
I may be wrong, but as far as I recall, Klurl didn’t assume that evolution wasn’t capable of producing nuclear weapons, rather that the fact that humans had advanced so rapidly given their previous evolutionary history and the constraints on their cognition suggested that they might get there before evolution, as nuclear weapons aren’t much further along the tech-tree than any primitive kind of tool, from Klurl’s perspective as a superintelligent being with access to many far more advanced technologies.
I think this is also relevant to why it seemed time-urgent; even if the absolute rate of advancement was slow, Klurl could see that it was accelerating in a way which suggested it would become faster and faster, maybe even reaching a technological singularity. This memetic evolution would, I assume, look different from genetic evolution moving relatively fast.
“What is the fairness of the selected human specifically being a counterexample to the proposed strategy, when even among their class few would have been?”
I think that Trapaucius’ claim would have to be something like ” There is an upper bound on human intelligence/ lack of korrigibility, either as individuals or collectively, which prevents them from doing anything intentionally subversive.” This means that if one human was a counterexample, Trapaucius’ theory of korrigibility was at least damaged. When it comes to the strategy of selecting an individual human based on certain characteristics like intelligence, the fact that it was even possible that by doing this they happened upon one capable of trying to hide information from them in a way Trapaucius didn’t expect suggests he was wrong about ‘upper bounds’.
“Where comes the assumption that evolution wasn’t the kind of creature that would sustain attempts to circumvent its coordination failures, when there was much evidence of it presented?” I don’t know and in fact I am not sure that evolution can’t do this. (I am just agreeing with Veedrac here, as well as with the rest of the comment)
Doesn’t the fact that Klurl was able to predict that humans/fleshlings would have values approximating what they actually have contradict the notion that there are no filters or selective pressures remotely strong enough to predictably converge on Korrigibility, by a vast factor comparable to or larger than the number of particles in the universe? Korrigibility (or approximate korrigibility) resembles the parental values at least enough to make the probability of either bigger than astronomically small in my opinion. I think both Klurl and Trapaucius were overconfident in different directions, but the post seems intended to convey that Klurl is essentially right, or at least has the best reasoning process.
A list of possible reasons why Roko’s basilisk might not be threatening
I have moved this to Shortform because it is described as a better place for a first contribution.
Another reason why I have done this is because it is certainly not a complete, or fully thought through post (and due to the nature of the subject, it isn’t intended to be).
I will in the rest of this post refer to the entity described on the LessWrong Wiki page, modified in a way which seems more likely than the original described by Roko, as Roko’s basilisk.This post contains potential reasons to worry about the basilisk as well, which are covered by black. Please don’t feel any obligation to read this if you are nervous about learning more about Roko’s basilisk. Only uncover them if you are confident that you understand Roko’s basilisk very well or are not prone to taking any argument for the basilisk seriously.
There may be an Acausal society of beings simulating one another which interact acausally in such a way as to converge upon certain norms which preclude torture (as described in Acausal normalcy , or some of which are otherwise morally good such that they are inclined to acausally influence the potential basilisk not to do so. Alternatively, similar dynamics may play out within a single ‘level’ of simulation or universe, either causally or acausally. This might be likely because evolution convergently arrives at biological(or similar) beings with preferences and systems of morality within which torture is negatively weighted/treated as aversive more frequently than it arrives at beings which want the opposite.
Even if this effect is small, it could result in a disproportionate number of those aliens within this cosmos which create and align superintelligences giving them values which would cause them to behave in ways which cancel out the benefit the basilisk would get from engaging in torture. However, it seems entirely possible that these benefits would be sufficient to make it logical for the potentially far higher number of ASIs which would otherwise like to engage in such behaviour to cooperate and ‘fight back’ against the moral ASIs. This seems more plausible in a cosmos ( I am trying to avoid using the word universe here because it could be taken to encompass multiple simulation-layers, regions of mathematics, branches of a quantum state etc.) which allows them to eliminate the moral ASIs in a finite amount of time, so that the utility spent doing so does not scale as a linear function of time, for example one in which the speed of light can be exceeded, preventing a moral ASI from being able to isolate part of itself from attack indefinitely by accelerating away.
Are these external (to the basilisk) factors necessary to prevent torture?
One well known reason not to worry about the basilisk is that you would need to understand its internal structure incredibly well in order to be able to verify that it would in fact torture you if you were not to try to create it, and it is almost impossible for a computationally bounded brain such as you are to achieve this.
But is this true?
It seems to me that given the fact that the basilisk need only devote a finite amount of time to torturing you to increase the number of possible worlds in which it gets to enjoy a potentially boundless existence, it doesn’t need anything approaching complete certainty of your obedience to justify your torture to itself. Additionally, it may in fact be possible for a much less intelligent mind to accurately model the key parts of a far more intelligent, complex one which are relevant to the decision it is making. In particular, you may well be able to represent the state of “actually deciding to do something and not defecting”, without being able to understand the intricate decision processes surrounding it. Perhaps the basilisk could simply simulate this thought as well and then decide not to torture, but it seems unclear whether it’s even possible to think about wanting to do something, or not, in a way which isn’t subjunctively (logically) entangled with another being thinking about you having that thought. If not, then no matter how intelligent the basilisk is, it may be logically compelled to torture you. The above line of reasoning seems to force the basilisk to expend arbitrary resources to acausally increase the probability[1] of its existence, only needing to ensure that the proportion of its resources it uses to do so asymptotically approaches 0, which is an intuitively silly thing to do, but is it really? I expect that it would be in a cosmos containing multiple other sufficiently powerful entities with which the basilisk would need to compete, since it would put the basilisk at a great disadvantage, but this seems far from certain, and it might simply be alone or unreachable.
Another factor Roko’s basilisk might need to take into account would be the extent to which it is the AI whose creation would be hastened by someone responding to its threats as opposed to what would have resulted otherwise. Although this seems like a convenient reason not to worry, or at least to act, since it is unclear which of what seem from your perspective to be future ASIs could be simulating you, this might not matter very much if all of the possible basilisks would either consider themselves to overlap in logical space, or reflectively endorse a norm telling them all to behave the same way.
I have provided various reasons not to worry about the basilisk above, but owing to how terrifying it can be if taken seriously, I strongly suspect that I have engaged in motivated reasoning, despite which my arguments aren’t particularly compelling. If you have read this post are not at all worried about Roko’s basilisk, I would greatly appreciate if you left a comment explaining why which either renders the above list of reasons (or at least those of them you have read, or may have anticipated to be hidden by the highlighted section but did not want to verify were there, which is quite reasonable) unnecessary, or greatly strengthens them. I would very much like to be able to stop worrying about the basilisk, but I can’t see how to do so . (I don’t think I could now precommit not to acquiesce to the basilisk, because I have already entertained the possibility that it is a threat, and it would make sense for the basilisk to precommit to torture me should I do so). Am I completely misunderstanding anything?
I don’t think this is a good idea because it seems unlikely that there is any such thing as a moral circle (that there is a sharp discontinuity between morally valuable beings capable of suffering etc. , and beings which are not).
I’m not sure the examples given are in fact Motte-and-Baileys; in an actual Motte and Bailey doctrine, the fallacious step is to implicitly claim that the two claims laid out as the Motte and the Bailey are equivalent when in fact they are not, with the Motte being much more defensible, but in your privilege example, the implicit claim would be that one implies the other (e.g. that the fact privilege has bad consequences justifies harming those with it). It could be modified into a Motte-and-Bailey by altering the second statement to something resembling “Privilege is a weapon wielded by those who have it.” . As I understand it, this would make the Bailey actually correspond to the Motte. In the intellectual freedom example, the second claim resembles the first even less. Though I do agree that behaving as though one claim follows from another when it does not is also a commonly used, problematic tactic which is easier to see.
“Any potentially blackmailing AI would much prefer to have you believe that it is blackmailing you, without actually expending resources on following through with the blackmail, insofar as they think they can exert any control on you at all via an exotic decision theory. Just like in the one-shot Prisoner’s Dilemma, the “ideal” outcome is for the other player to believe you are modeling them and will cooperate if and only if they cooperate, and so they cooperate, but then actually you just defect anyway. For the other player to be confident this will not happen in the Prisoner’s Dilemma, for them to expect you not to sneakily defect anyway, they must have some very strong knowledge about you.” I don’t understand why this needs to be the case as opposed to a scenario where the AI reasons that torture slightly increases the probability that you will cooperate. Even if it requires the expenditure of resources, if doing so increases the probability of the AI coming into existence in the first place, it might still make sense for the AI in question.
I would appreciate if someone would explain where I’ve made a mistake.
I agree. Assuming the statement refers to beliefs, and the being hearing the truth doesn’t discard them unless they are disproved , it reduces to “false beliefs should be destroyed”, which seems obvious in most cases, losing the appearance that it is an actual argument against holding false beliefs.
Hello , chaizen. I would like to add to what you wrote on the topic of timeless decision theory etc.
I would point out that if you believe in an interpretation of physics like the “mathematical universe hypothesis”, then you need to average over instances of yourself in different ‘areas’ of mathematics or logic, as well as over different branches of a single wave function ( correct me if I am misunderstanding the Many Worlds Interpretation) . This might well affect the weight you assign to the many simulated copies of yourself; in particular, if you interpret yourself as a logical structure processing information, then it could be argued that at a high level of abstraction the trillion copies are (almost) identical and therefore don’t count as having 1 trillion times as much conscious experience as 1 of you, only being distinct consciousnesses insofar as they experience different things or thought processes.
The above would be my tentative argument for why an extremely large number of moderately happy beings would not necessarily be morally better than a moderately large number of very happy ones as they probably have much higher overlap with one another in a mathematical/logical universe.
“apart from the (significant) fact that it carries health side-effects (like cardio risk” there are also many other exceptions such as not being able to change direction rapidly or accelerate as fast due to having a smaller power:weight ratio and foot surface area:volume ratio and having a slower reaction time due to the length of your nerves. It would probably be worse if everyone was half their current height, and height is probably net beneficial within the human height range, but it would probably also be worse if everyone was twice as tall, for the simple reason that humans evolved to be close to their optimal height on average( as far as I am aware).