Alignment work in anomalous worlds

Link post

Though I do not personally believe that this post is particularly hazardous, works on related topics such as meta-mega crossover or the hour I first believed bear such warnings, and so I’m mentioning that this post relates to things like {alien simulations of us where the aliens interfere with the simulation}, and (possibly acausal) trades across the multiverse. Remember that not reading this post is a real option which, as an agent, you can take if you think it is preferable.

Suppose people gain magic powers, or it starts raining cotton candy, or any other manner of strange phonema which imply that we live in a strange world probably manipulated by aliens — whether physically or by being a simulation. What shall one do in such a world?

The answer is, as usual, to continue working on solving AI alignment.

This might seem strange — surely in such a world there’s a lot more going on to worry about than AI alignment, no? Well, no.

Trade with aliens in the far future is not a zero-sum game. It might be that we encounter an alien intelligence which really likes it when it rains cotton candy on various planets containing copies of civilizations (including pre-singularity), even in simulations; and we happen to be one of the worlds they’re simulating. By being the kind of agents who still create nice aligned-AI utopia when it rains cotton candy, we get to have utopia even in those simulations.

I don’t think there are much aliens who have the ability to {run simulations of us except it runs cotton candy} but not {steer us to do whatever they want if they care to. So the world in which your actions matter are the ones in which the aliens have decided to leave us in some way in control of our future, and only interfere in ways that don’t get too much in the way of that. And we still want our future to be nice utopia rather than everyone dead forever. So we should take the actions that steer the steerable subset of worlds-we-inhabit.

You could think of this as a way for simulations to engage in positive-sum trade: suppose some civilization wants to run at copies of alien (to itself) civilizations’ (plausibly distribution of) past home-planets, except that it rains cotton on them, but which are otherwise to be left alone. If we are the kind of civilization which, in those situations, would fill a bunch of our aligned-AI-utopia future lightcone with cotton candy rain, they might be happier to run a cotton-candy-rained-on version of us, rather than of some other civilization.

(There’s some logical decision theory at play here: once you know you’re getting simulated, you might be like “aha, I get to be ran now, so I can turn around and not bother to solve alignment actually!” — but no, your civilization only gets simulated if it’s the kind to still solve alignment anyways.)

And while those worlds are “only simulations”, it’s not guaranteed that we get anything else! We might get to be an alien’s copy of an “authentic” earth, but also there might be no “authentic us” or even no “authentic earth” at all and the aliens are just running out out of some distribution of civilizations they find interesting to look at. Note that there is no threat involved here; the aliens aren’t going “play along or we’ll shut you down”, they’re going “play along and we’ll causate you into existence at all, when you wouldn’t have existed otherwise”. I’m happy to grab those bits of utopia, or at the very least tell the aligned superintelligence we build in worlds-where-we-survive “deal with this situation in whatever way satisfy our values”.

Alternatively: as an authentic earth, we might at any moment be about to get killed by some unaligned AI — ours or an alien’s — which will then later sell a (possibly very lossily compressed) copy of our civilization to an alien civilization which would be willing to run a rescue-simulation of us (in exchange for giving the AI who killed us a bunch of compute/​negentropy), and then some of those rescue-simulation aliens might leave us in charge of our own world but still have preferences over what we make of that. In which case, again, by creating an AI aligned to build a combination of {the utopia we want} and {things the rescue-sim aliens like}, we make it that the rescue-sim aliens saved us instead of saving some other civilization, or instead of saving a branch of earth in which we build clippy and it creates a combination of paperclips and {things the rescue-sim aliens like}.

Basically, us reliably building aligned-AI utopia in as many circumstances as possible overall maximizes how much of realityfluid is filled with our utopia, by having those utopias in simulations as well and/​or by allowing our future selves to make trades that are still beneficial to us because we can count on our past selves to build utopia even if strange circumstances occur.

So if a UFO lands in your backyard and aliens ask if you if you want to go on a magical (but not particularly instrumental) space adventure with them, I think it’s reasonable to very politely decline, and get back to work solving alignment.

[EDIT: As some comments point out, if interacting with the anomalous stuff seems plausibly like it could help save the world, then for sure, go for it. The point of my post was to not give up on saving the world just because anomalous things happen.]