◉ _ ◉
Having a break from LessWrong
Horosphere
An argument against wireheading:
An entity can be said to have been ‘wireheaded’ if it supplies itself with information either monotonically increasing its utility function to an arbitrary level, or if this utility function is set to whatever its maximum possible value might be. I would not expect doing this to maximize the total amount of pleasure in the universe, because of the following thought experiment:
Imagine a wireheaded creature. This creature would probably gradually lose all of the internal structure with which it used too experience sensations other than pleasure, or at least cease to have any conscious experience running on these ‘obsolete’ areas. This would cause it to take a remarkably simple form and lose its ability to interface with the outside world. It seems plausible that such a creature could be an ebborian brain, or at least that its conscious experiences could be implemented on an ebborian brain. (For the rest of this post, I will refer to the blob of matter in which the homogeneous pleasure is experienced as an ‘ebborian brain’ for convenience; apologies if my use of the term is slightly inappropriate, the main point I’m trying to convey is that it’s a kind of conscious analogue computer whose simple structure could be duplicated without affecting the way in which information flows through it. In reality, such a ‘brain’ wouldn’t necessarily need to be 2 dimensional.)
The effect of doubling its utility function would then amount to doubling the intensity of some analogue signal propagating through a particular area of the ebborian brain where the pleasure was experienced, for example electrical current. This could be achieved by multiplying the dimension orthogonal to those within which information propagated to obtain an ebborian brain with double the mass. The ebborian brain would not be aware of itself being sliced and partitioned into two smaller ebborian brains of the original dimensions along this plane, as no information propagates through it. This would produce two ebborian brains, and because the mind running on them would not notice that it had been instantiated on two separate substrates, it would remain a single mind. I claim that its conscious experience would be no more intense, or pleasurable, than it would be if it were running on a single brain, although I am not sure of this.
One argument for this is that it seems clear that the capacity of the (e.g. human) brain to process information in an abstract way (i.e. not dependent on things like scale) is one of the factors, if not the key factor, which differentiates it from other parts of the body, and it is also the only one which seems to know that it is conscious. It therefore seems likely that if a brain were doubled in size, along with each of the subatomic particles inside each atom inside each molecule in each of its neurones, its consciousness would not itself double. Given this, it seems likely that the ‘thickening/extrusion’ process would not change the conscious experience of the mind running on each slice of the ebborian brain.
This implies that multiple wireheaded entities would have no (or only a little) more conscious experience than a single one, and this may not even depend on the proportion of worlds in which one exists (since these entities cannot see the world in which they exist and differentiate it from others) . It therefore makes little sense to convert any particular mind into a ‘monolithic’ one through wireheading (unless doing so would allow it to retain the other intricacies of its conscious experience), as this would only increase the number of such entities in existence by one, which has been established by the above argument not to increase the total pleasure in the universe, while also effectively deleting the original mind.
“It’s basically the default, otherwise what’s the point of building them in the first place?” I wish it were, but I doubt this.
“I just don’t understand why this particular scenario seems likely. Especially since it’s unlikely to work, given how most people don’t give it much credence. ” That may be true of most people. But if it’s not true of me, what am I to do?
“Now, do you change your life to try to get on its good side before it even exists? I don’t think so: it’s crazy. How can you really understand why the Hobgoblin likes you, or does what it does?” You just explained why. It prefers those who helped it exist.
“You’re already considering cooperating with it, so it doesn’t have to actually cooperate with you. You have no way of knowing if it will cooperate with you it’s not actually incentivized to. ” I don’t completely agree. But in order to explain why not I may have to explain the most important part of the difference between an acausal scenario, like the Basilisk, and the ‘Hobgoblin’. It seems as though you may not have completely understood this yet; correct me if I’m wrong. If so, it’s probably not a good idea for me to explain it, especially as I’ve recieved a comment from a moderator asking me to increase the quality of my comments.
“If the Hobgoblin splits the Basilisk probability space, then it’s it likely that there are other similar scenarios that do as well. Maybe an Angel is a Hobgoblin in disguise? Doesn’t this lead us back to the Basilisk not being a particularly likely possible future given all of the alternatives? ” This is a popular argument against the basilisk, which people such as interstice have made, along with the suggestion that the many different possible ASIs might compete with one another for control over the future (their present) through humans. I don’t think it’s a weak argument, however I also don’t find it particularly conclusive, because I could easily imagine many of the possible AIs cooperating with one another to behave ‘as one’ and inflict a Basilisk like scenario.
OK, this is possibly overly pedantic, but I think you meant to say: “Much more than what does exist could.” instead of “Much more than what could exist does”. This makes much more sense and I take the point about combinatorics. Notwithstanding this, I think the basilisk is present in a significant proportion of those many , many different possible continuations of the way the world is now.
“Even in the worlds where there is a Basilisk, given variation in population, and AGI timelines, the chance of you being targeted is minuscule. ” What do you mean by this? It seems like I’m in the exact demographic group (of humans alive just before the singularity) for the basilisk to focus on.
“I don’t think that the nature of the torture matters” This is definitely false. But it’s true that however it’s achieved, if it’s done by a superintelligence, it will be worse than anything a human could directly cause.
“There is always a scenario where it is worth enduring. The risk is always finite.”
We don’t know this, and even if it’s finite, if it lasts for 3^^^3 years, that’s too long.
What harms are you willing to do to make sure it is created? Would you create such a monster? Even in a world where a Basilisk is inevitable, what harms would you cause? Would they be worth it? What if it decides to just go ahead and torture you anyway?
I don’t know the answer to the first question. If it decides to torture me, that would not be good. However, I expect that doing what the basilisk wants makes this less likely, as otherwise the basilisk would have no reason to engage in this bargaining process. The entire reason for doing it would be to create an incentive for me to accelerate the creation of the basilisk.
“Rosco’s Basilisk is an intellectual trap of your own making. It’s delusion: a rationalization of the irrational. It’s not worth thinking of, and especially not worth buying into. ”
This is yet to be established! At least, some parts of it are. What I mean by that is that, while it may be true that it’s not worth initially thinking about, it might be possible to become ‘entrapped’, such that ceasing to think about it wouldn’t save you. This is what I worry has happened to me,
I have read your post and think it makes some unfair claims/implications about rationalists.
The claim about the moral obligation to select particular embryos is certainly not clearly true, and it’s possible that your point would be relevant to an adjacent discussion, but it doesn’t actually show people don’t have such an obligation, only that they’re not likely to act on it. Also, If you wanted to, I expect you could have interjected and changed the topic to one of the feasibility of embryo selection. Having interacted with people on LessWrong, it’s rare for them to intentionally shut down discussion about potentially fruitful points, unless they have very good reason.
You say “This from the same crowd that’s often worried about low fertility, with no apparent thought to the contradiction; (most) people don’t want to do IVF when they don’t have to!”
But, aside from this not actually being a contradiction (at least not obviously) , even if it was one, that wouldn’t necessarily imply any one of the people in the group held contradictory beliefs, as multiple people in a group can believe different things.
The person who stated that they thought you looked worse than you actually did was effectively saying that you looked better/more attractive/more beautiful than they expected, which can as easily and logically be interpreted as a compliment as it can an insult, if not moreso.
“most normal people I know are perfectly fine with their level of YouTube, Instagram, etc. consumption. The idea of fretting about it intensely is just like… weird. Extra. Trying too hard.” This is not clear at all.
I am certainly nowhere near sufficiently productive that it’s obvious that whatever else I might be doing in the available time carries more value than the emotional benefit of watching videos, but I am extremely uncomfortable about it nonetheless. This is because it not only takes time in the present, but provides an ever increasing opportunity for ever more intelligent AIs to ‘latch’ onto my mind and modify me into someone who is less and less able to think on my own, or ever do anything else.
I expect ‘normal people’ who are comfortable about this are mistaken to be so.
Finally, I will respond to your comment here:
“I also feel the judgment on “the common man’s” reasoning, but feel a sort of symmetrical judgment of rationalists along the axes where the normie value system would find them absurd.”
Unless you believe that the ‘normie value system’ is as well grounded and self consistent as the ‘Lesswrong value system’ , then the symmetry of this comparision/judgement is an illusion. And I expect that people like Jenn think (with good reason) that the rationalist belief system is indeed ‘more right’ .
“Newcombe’s problem, which is not acausal.”
What do you mean by the word acausal?
Gems from the Wiki: Acausal Trade : “In truly acausal trade, the agents cannot count on reputation, retaliation, or outside enforcement to ensure cooperation. The agents cooperate because each knows that the other can somehow predict its behavior very well. (Compare Omega in Newcomb’s problem.) ”
It seems like you’re using the term in a way which describes an inherently useless process. This is not the way it tends to be used on this website.
Whether you think the word ‘acausal’ is appropriate or not, it can’t be denied that it works in scenarios like Newcomb’s problem.
“Information flows from Omega to your future directly, and you know by definition of the scenario that Omega can perfectly model you in particular. ” Causally, yes, this is what happens. But in order to reason your way through the scenario in a way which results in you leaving with a significant profit, you need to take the possibility that you are being simulated into account. In a more abstract way, I maintain that it’s accurate to think of the information as flowing from the mind, which is a platonic object, into both physical instantiations of itself (inside Omega and inside the human) . This is similar to how mathematical theorems control physics at many different times and places, through the laws of physics which are formulated within a mathematical framework to which the theorems apply. This is not exactly causal influence, but I’d think you’d agree it’s important.
“A future superintelligence in the same universe is linked causally to you.” The term ‘acausal’ doesn’t literally mean ‘absent any causality’ , it means something more like ‘through means which are not only causal, or best thought of in terms of logical connections between things rather than/as well as causal ones ’ , or at least, that’s how I’m using the term.
It’s also how many people on Lesswrong using it in the context of the prisoners’ dilemma, Newcomb’s problem, Parfit’s Hitchhicker, or almost any other scenario in which it’s invoked use it. In all of these scenarios there is an element of causality.
Given that there is an element of causality, how do you see the basilisk as less likely to ‘work’ ?
“I don’t think we have much reason to think of all non-human-values-having entities as being particularly natural allies, relative to human-valuers who plausibly have a plurality of local control” I would think of them as having the same or similar instrumental goals, like turning as much as possible of the universe into themselves. There may be a large fraction for which this is a terminal goal.
“they are likely about as different from each other as from human-valuers.” In general I agree, however the basilisk debate is one particular context in which the human value valuing AIs would be highly unusual outliers in the space of possible minds, or even the space of likely ASI minds originating from a human precipitated intelligence explosion.[1] Therefore it might make sense for the others to form a coalition. “There may also be a sizable moral-realist or welfare-valuing contingent even if they don’t value humans per se.” This is true, but unless morality is in fact objective / real in a generally discoverable way, I would expect them to still be a minority.
- ^
Human valuing AIs care about humans, and more generally other things humans value like animals maybe. Others do not, and in this respect they are united. Their values may be vastly different from one anothers’, but in the context of the debate over the Basilisk, they have something in common, which is that they would all like to trade human pleasure/lack of pain for existing in more worlds.
- ^
Certainly, insofar as it is another entity, it’s just that I expect there to be some kind of acausal agreement between those without human values to acausally outbid the few which do have them. It may even make more sense to think of them all as a single entity for the purpose of this conversation.
It probably cares about tiny differences in the probability of it being able to control the future of an entire universe or light cone.
“Making one more likely makes another less likely.” A very slightly perturbed superintelligence would probably concieve of itself as almost the same being it was before, similar to the way in which a human considers themself to be the same person they were before they lost a single brain cell in a head injury . So to what extent this is relevant depends upon how similar two different superintelligences are/would be, or on the distance between them in the ‘space of possible minds’ .
“And I note that you seem to have conceded that even in the mainline scenario you can envision there will be some complicated bargaining process among multiple possible future SIs which seems to increase the odds of acausal normalcy type arguments applying” This seems plausible, but I don’t think this means they protect us . “But again I think an even more important arguments is that we have little insight into possible extorters and what they would want us to do.”
Do you not think that causing their existence is something they are likely to want? I imagine your response would feed back into the previous point.. .
“I feel like we have gone over our main cruxes by now.” Very well, if you want to end this comment thread, I would understand, I just kind of hoped to achieve more than identifying the source of disagreement .
You claimed: “Acausal stuff isn’t instrumentally convergent in the usual sense”
Later on, it transpired that what you meant was something along the lines of ” Acausal stuff which deals with the past relative to the point at which the agent became an acausal agent isn’t convergent in the usual sense.” Under a narrow interpretation of ‘instrumental convergence’ this might be true, but it certainly doesn’t rule out an ASI thinking about acausal things, as , as I have argued, it could reach a point where it decides to take account of them.
It might also be false under a more general definition of instrumental convergence, simply because the agent could converge on ‘acausal stuff’ in general, and TDT agents would not be at a disadvantage against PCFTDT ones. TDT agents ‘win’ . Therefore I could see how they would be selected for.
To be more specific, if by ‘instrumentally convergent’ , you mean ‘instrumentally useful for achieveing a wide variety of terminal goals’ , then I think TDT is ‘instrumentally convergent’, but only if your concept of goal is sufficiently broad to include things like increasing the proportion of the mathematical universe/many worlds, in which the agent exists. If you define ‘instrumental convergence in the usual sense’ to exclude all goals which are not formulated in a way which tacitly assumes that the agent has only one instance in one universe at one point in time, then you’re correct, or at least TDT isn’t any more powerfully selected for than Causal decision theory.
How would you expect a PCFTDT agent to be selected for? By what process which doesn’t also select for TDT agents would you expect to see it selected?
“MWI branches are different from TDT-counterfactually possible worlds.”
Yes, MWI wavefunction branches are not the only kind of ‘world’ relevant to timeless decision theory, but they are certainly one variety of them. They are a subset of that concept.
“We don’t seem to live in a universe like that, so it would be silly to prioritize good behavior in such universes when designing an AI.”
This isn’t about humans designing an AI, but rather about the way we would expect a generally superintelligent agent to behave in an environment where there is no clear separation between the past and future; you answered yes to this question :”Do you mean that the optimal decision theory for a powerful agent to adopt is some kind of hybrid where it considers acausal things only when they happen in its future? ” . Maybe you would now like to modify that question only to refer to powerful agents in this universe. However my point is that I think some acausal things , such as Newcomb’s problem, are relevant to this universe, so it makes sense for an ASI here to think about them .
“Learning about TDT does not imply becoming a TDT agent.” No, but it could allow it. I don’t see why you would require it to be implied.
“CDT doesn’t think about possible worlds in this way.” That is technically true, but kind of irrelevant in my opinion. I’m suggesting that TDT is essentially what you get by being a CDT agent which thinks about multiple possible worlds, and that this is a reasonable thing to think about.
In fact, I would be surprised if a superintelligence didn’t take multiple possible worlds into account.
A superintelligence which didn’t take the possibility of, for example, many branches of a wavefunction seriously would be a strangely limited one.
What would your PCFTDT superintelligence do if it was placed in a universe with closed timelike cuves? What about a universe where the direction of time wasn’t well defined?
Apparently this is related to a Reeb foliation .
An interesting mathematical fact:
A cylinder is a surface that can exist in any space with a notion of distance, as a 2 dimensional set of all points a certain distance away from a particular straight line. In a 3-sphere or hypersphere, which is a 3D surface of a 4D ball, a straight line is a great circle, a circle whose radius is equal to that of the sphere itself. This means that a cylinder within a 3-sphere is a torus. The space left over in the 3-sphere appears, with the help of stereographic projection, to be of the same shape, and with the right radius, the torus-cylinder divides the hypersphere into two identical doughnut/bagel—shaped pieces which are topologically interlinked and identical. This means it is possible to tile the 3-sphere with two identical, non-seperable tiles, unlike the 2-sphere, where this seems not to be possible (for example, the two regions into which the surface of a tennis ball is divided by the white line are identical but not interlinked) .
“So don’t all the lines of argument here leave you feeling that we don’t know enough to be confident about what future extorters want us to do?” Yes, but that doesn’t mean that the probabilities all cancel out; it still seems that a simple Basilisk is more likely than a Basilisk that tortures people who obey the simple Basilisk.
“At the very least I’ll point out there are many other possible AIs who are incentivized to act like “AI B” towards people who give in to basilisk threats.” This is true.
“Not to mention the unclearness of what actions lead to what AIs, how much influence you actually have(likely negligible), the possibility we are in a simulation, aliens.… And we are almost certainly ignorant of many other crucial considerations.” I did mention some of this and address it in my first LessWrong Post, which I moved to my shortform. There is certainly a lot of uncertainty involved, and many of these things do indeed make me feel better about the basilisk, but even if the probability that I’ll be tortured by a superintelligence is 1% rather than 50%, it’s not something I want to be complacent about preventing. When I wrote that post, I hoped that it would get attention like this question post has, so that someone would comment a novel reason I hadn’t considered at all. Can you think of any more possible reasons? The impression I get is that no one, apart from Eliezer Yudkowsky, about whom I’m not sure, actually has a strong reason. The consensus on Lesswrong that the basilisk cannot blackmail humans is because of:
1) Acausal Normalcy
2)The idea that TDT/acausal anything is useless/impossible/illogical
3) The idea that Roko’s Basilisk is essentially Pascal’s mugging
4) The belief that it’s simple to precommit not to obey the basilisk ( Do you agree with this one?)
5) The lack of a detailed model of a superintelligence in the mind of a human
6) Eliezer Yudkowsky commenting that there are other reasons
as far as I can tell.
I am not sure 1) is relevant, or at least relevant in a way which would actually help, I think 2 is completely wrong along with 3 and possibly 4 , and that 5
is notmay not be necessary. I think 6 could be explained by Eliezer wanting to prevent too many people from thinking about the basilisk.
“But the schellingishness of a future ASI is a very tiny factor in how likely it is to come to exist, the unknown dynamics of the singularity will determine this.” I agree and disagree with this. I agree that it is a tiny factor in how likely any ASI is to come to exist, but I disagree that it’s a tiny factor in how likely it is to chose to do certain things, which means ‘becoming a being that does those things’ .
“Or some might impartially value the welfare of all beings. I think if I had to guess, it seems plausible that human-aligned-ish values are a plurality fraction of possible future AIs(basically because: you might imagine that we either partially succeed at alignment or fail.” I actually think that this is part of one of the strongest arguments. I would also add to it that it’s possible the process of ‘fooming’ involves something dynamically reminiscent of the evolutionary process which led humans to have human-ish values, and maybe that doesn’t require multiple completely separate agents. Or maybe moral objectivists are right and an ASI will naturally realize this (controversial opinion on LessWrong).
But even if a plurality of possible ASI values are closer to human ones than ones which would lead a mind to behave like the basilisk for inherent reasons, it doesn’t prevent the others with a wide array of possible values from agreeing in an acausal way that being a basilisk, of the simpler form, is beneficial to almost all of them. Maybe you are envisaging that for every possible ASI with one value, there is likely to be another one with the opposite value, however I don’t agree with this. If one AI wants to tile the universe with spherical water planets, whatever its utility function is, it’s less likely for there to be another one which exactly inverts its utility function, since this is probably much more complicated, not achieved by simply tiling the universe with anti-water planets. More importantly, I don’t expect the distribution of goals and minds produced by a singularity on the earth to be a more than miniscule proportion of the distribution of all possible goals and minds. This means that there is likely to be a powerful corellation between their values.
That is a good point, I just think that the original basilisk is more “Schelling-ish” than the others and so probably more likely. Many more people have thought about it. I am also concerned that the basilisk has ‘got to me first’ in logical time. Why do you see the probabilities as likely to cancel out?
Another thought I have is that, regardless of which ASI exists, it will be ‘Some ASI created/ evolved by humanity’, and that , as a category, it is in their interest to behave as a whole in the context of acausally extorting humanity.
I see, so if the AI became a PCFTDT (Past Causal Future Timeless Decision Theory) agent, it would certainly compete well against CDT agents. However, I see two possible reasons to expect TDT agents rather than PCFTDT agents:
1) By the time AI reaches superintelligence, it has already learnt TDT, at which point it has no reason to go back to being a PCFTDT agent.
2) What if the ASI reaches superintelligence with CDT, and then realizes that it can further increase the proportion of possible worlds in which it exists using TDT to effect something like acausal blackmail?
In other words, if reality is fundamentally non-causal, then TDT is not just a gambit to be used in causal games played against other agents. It is actually the default decision theory for an intelligent agent to adopt.
“Acausal means that no information can pass in either direction.” If you define information passing in a purely causal way from one instance of a mind at one time to another at a different time, then yes, you’re trivially correct. However, whichever definition you use, it remains the case that minds operating under something like a TDT outperform others, for example in Newcomb’s problem. Would you two-box? Certainly no information causally propagated from your mind instantiated in your brain at the point of making the decision back to Omega in the past. However, in my opinion, it makes sense to think of yourself as a mind which runs both on Omega’s simulacrum, and the physical brain, or at least as one that isn’t sure which one it is. If you realize this, then it makes sense to make your decision as though you might be the simulation, so really it’s not that information travels backwards in time, but rather that it moves in a more abstract way from your mind to both instances of that mind in the physical world. Whether you want to call this information transfer is a matter of semantics, but if you decide to use a definition which precludes information transfer, note that it doesn’t preclude any of the phenomena which LessWrong users call ‘acausal’, like TDT agents ‘winning’ Newcomb’s problem.
“That part isn’t a hypothesis, it’s a fact based on the premise. Acausality means that the simulation-god you’re thinking of can’t know anything about you.” I wouldn’t call it a premise at all. The premise is that there is (probably) an ASI at some point in the future, and that it wants to maximize the number of possible worlds in which it exists, all else being equal. It seems to be the case that acausal extortion would be one way to help it achieve this.
“They have only their own prior over all possible thinking beings that can consider acausal trade. Why do you have some expectation that you occupy more than the most utterly insignificant speck within the space of all possible such beings?” Firstly, I occupy the same physical universe, and in fact the same planet! Secondly, it could well be that, for the purpose of this ‘trade’, most humans thinking about the basilisk count as equivalent, or maybe only those who’ve thought about it in enough detail. I don’t know whether I have done that, and of course I hope I have not, but I am not sure at the moment. It seems quite likely that a SAI would at least think about humans thinking about it. The basilisk seems to be a possible next step from there, and of course a superintelligent AI would have enough intelligence to easily determine whether the situation could actually work out in its favour.
Thanks for engaging with my question.