Alignment work in anomalous worlds

Tamsin Leake16 Dec 2023 19:34 UTC

24 points

AI Simulation Hypothesis Decision Theory

Though I do not personally believe that this post is particularly hazardous, works on related topics such as meta-mega crossover or the hour I first believed bear such warnings, and so I’m mentioning that this post relates to things like {alien simulations of us where the aliens interfere with the simulation}, and (possibly acausal) trades across the multiverse. Remember that not reading this post is a real option which, as an agent, you can take if you think it is preferable.

Suppose people gain magic powers, or it starts raining cotton candy, or any other manner of strange phonema which imply that we live in a strange world probably manipulated by aliens — whether physically or by being a simulation. What shall one do in such a world?

The answer is, as usual, to continue working on solving AI alignment.

This might seem strange — surely in such a world there’s a lot more going on to worry about than AI alignment, no? Well, no.

Trade with aliens in the far future is not a zero-sum game. It might be that we encounter an alien intelligence which really likes it when it rains cotton candy on various planets containing copies of civilizations (including pre-singularity), even in simulations; and we happen to be one of the worlds they’re simulating. By being the kind of agents who still create nice aligned-AI utopia when it rains cotton candy, we get to have utopia even in those simulations.

I don’t think there are much aliens who have the ability to {run simulations of us except it runs cotton candy} but not {steer us to do whatever they want if they care to. So the world in which your actions matter are the ones in which the aliens have decided to leave us in some way in control of our future, and only interfere in ways that don’t get too much in the way of that. And we still want our future to be nice utopia rather than everyone dead forever. So we should take the actions that steer the steerable subset of worlds-we-inhabit.

You could think of this as a way for simulations to engage in positive-sum trade: suppose some civilization wants to run at copies of alien (to itself) civilizations’ (plausibly distribution of) past home-planets, except that it rains cotton on them, but which are otherwise to be left alone. If we are the kind of civilization which, in those situations, would fill a bunch of our aligned-AI-utopia future lightcone with cotton candy rain, they might be happier to run a cotton-candy-rained-on version of us, rather than of some other civilization.

(There’s some logical decision theory at play here: once you know you’re getting simulated, you might be like “aha, I get to be ran now, so I can turn around and not bother to solve alignment actually!” — but no, your civilization only gets simulated if it’s the kind to still solve alignment anyways.)

And while those worlds are “only simulations”, it’s not guaranteed that we get anything else! We might get to be an alien’s copy of an “authentic” earth, but also there might be no “authentic us” or even no “authentic earth” at all and the aliens are just running out out of some distribution of civilizations they find interesting to look at. Note that there is no threat involved here; the aliens aren’t going “play along or we’ll shut you down”, they’re going “play along and we’ll causate you into existence at all, when you wouldn’t have existed otherwise”. I’m happy to grab those bits of utopia, or at the very least tell the aligned superintelligence we build in worlds-where-we-survive “deal with this situation in whatever way satisfy our values”.

Alternatively: as an authentic earth, we might at any moment be about to get killed by some unaligned AI — ours or an alien’s — which will then later sell a (possibly very lossily compressed) copy of our civilization to an alien civilization which would be willing to run a rescue-simulation of us (in exchange for giving the AI who killed us a bunch of compute/negentropy), and then some of those rescue-simulation aliens might leave us in charge of our own world but still have preferences over what we make of that. In which case, again, by creating an AI aligned to build a combination of {the utopia we want} and {things the rescue-sim aliens like}, we make it that the rescue-sim aliens saved us instead of saving some other civilization, or instead of saving a branch of earth in which we build clippy and it creates a combination of paperclips and {things the rescue-sim aliens like}.

Basically, us reliably building aligned-AI utopia in as many circumstances as possible overall maximizes how much of realityfluid is filled with our utopia, by having those utopias in simulations as well and/or by allowing our future selves to make trades that are still beneficial to us because we can count on our past selves to build utopia even if strange circumstances occur.

So if a UFO lands in your backyard and aliens ask if you if you want to go on a magical (but not particularly instrumental) space adventure with them, I think it’s reasonable to very politely decline, and get back to work solving alignment.

[EDIT: As some comments point out, if interacting with the anomalous stuff seems plausibly like it could help save the world, then for sure, go for it. The point of my post was to not give up on saving the world just because anomalous things happen.]

Tamsin Leake16 Dec 2023 19:34 UTC

24 points

4 comments3 min readLW link

AI Simulation Hypothesis Decision Theory

AprilSR 17 Dec 2023 8:09 UTC
11 points
9

So if a UFO lands in your backyard and aliens ask if you if you want to go on a magical (but not particularly instrumental) space adventure with them, I think it’s reasonable to very politely decline, and get back to work solving alignment.

I think I’d probably go for that, actually, if there isn’t some specific reason to very strongly doubt it could possibly help? It seems somewhat more likely that I’ll end up decisive via space adventure than by mundane means, even if there’s no obvious way the space adventure will contribute.

This is different if you’re already in a position where you’re making substantial progress though.
Thane Ruthenis 16 Dec 2023 20:48 UTC
4 points
6
Mm, I agree with the detailed scenarios outlined, but I think there are some anomalous worlds the correct policy in which is to switch from working on AI Alignment: worlds in which there’s a shorter avenue to faithful cognitive augmentation/creating an utopia/such stuff. E. g., if we discover a very flexible magic system, then abandoning all your agency-research stuff and re-specializing towards figuring out how to design a Becomus Goddus spell in it may be the correct move. Or if the Matrix Lord comes down and explicitly sends you off on a heroic quest the reward for completing which will be control over the simulation.
Which also means you’d need to investigate the anomalous occurrences first, so that you may do your due diligence and check that they don’t contain said shorter avenues (or existential threats more pressing than AGI).
Overall, I agree that there’s a thing you need to single-mindedly pursue across all possible worlds: your CEV. Which, for most people, likely means an eutopia. And in this world, the shortest and most robust route there seems to be solving AGI Alignment via agency-foundations research. Sure. But it’s not a universal constant.
(I mean, I suppose it kind of all does ultimately bottom out in you causing a humanity-aligned superintelligence to exist (so that it can do acausal trade for us). But the pathways there can look very different, and not necessarily most-accurately described as “working on solving alignment”.)
- Tamsin Leake 16 Dec 2023 20:54 UTC
  5 points
  0
  Parent
  Oh, yeah, you’re right.
NicholasKross 16 Dec 2023 20:10 UTC
4 points
0
Someone may think “Anomalous worlds imply the simulation-runners will save us from failing at alignment!”
My reply is: Why are they running a simulation where we have to solve alignment?
At a first pass, if we’re in a simulation, it’s probably for research, rather than e.g. a video game or utopia. (H/t an IRL friend for pointing this out).
Therefore, if we observe ourselves needing to solve AI alignment (and not having solved it yet), the simulation-runners potentially also need AI alignment to get solved. And if history is any guide, we should not rely on any such beings “saving us” before things cross a given threshold of badness.
(There are other caveats I can respond to about this, but please DM me about them if you think of them, since they may be infohazard-leaning and (thus) should not be commented publicly, pending more understanding.)