Mitigating the damage from unaligned ASI by cooperating with aliens that don’t exist yet
Epistemic status: Pretty sure the idea is sound, but its feasibility is unfortunately a bit questionable.
Humanity must as soon as possible send a message to civilizations beyond Earth containing the following:
Everything we currently know about the alignment problem, together with the bare minimum of AI knowledge necessary to understand it without drastically increasing capabilities as well;
Our precommitment that, if we solve it and create an aligned ASI, and later on in the future while expanding through the cosmos meet the expansion bubble of another ASI:
if the ASI is not aligned with the values of the civilization that created it, we will “defect” against it;
if the ASI is aligned, but the civilization that created it knew about our precommitment without making (and embedding in the ASI) the same precommitment themselves, we will still defect;
if the ASI is aligned and the civilization that created it did make the same precommitment (verifiable from the ASI’s source code when inspected by ours), *then* we will cooperate and willingly share the universe with them.
Why share alignment knowledge with aliens?
We care about their welfare, and more selfishly, about the possibility of someday meeting them. If an unaligned ASI kills them all, that would be a great loss to the universe and to us.
We want to minimize the amount of the universe eventually controlled by unaligned ASIs because their values tend to be absurd and their very existence is abhorrent to us.
Unaligned ASIs, even if made by aliens, pose a potential existential threat to our civilization (if they end up controlling more of the universe than us by the time we meet).
Most sophont species capable of forming a technological civilization sufficiently advanced to produce ASI are likely to be highly social and have values that are significantly more similar to ours than chance as a result (for instance, valuing life, fairness, beauty, etc), meaning that if they have an aligned ASI to help them maximize their values, that advances some of our values as well.
Why precommit to defect against unaligned ASIs when it’s superrational to cooperate? Warfare between ASIs risks astronomical waste!
Yes, that’s the point. This is a signal to any unaligned ASIs that do get made elsewhere in the universe that they may benefit from self-modifying to become at least partly aligned in order to avoid our wrath. If the loss to their current unaligned values after doing that is less than the loss from whatever destruction or other disvalue we could impose upon them, then they will do so.
What does “defect” even mean in this context?
Either physical warfare or the creation of huge amounts of the other entity’s disvalue, if its utility function goes into the negative range. (Is there such a thing as anti-paperclips? If so, in a war with Clippy, we’d fill the universe with those. Maybe I should ask DALL-E Mini for an angry crying wojak meme in the shape of a paperclip.)
All right, but why try to coerce them into being aligned with their creator civ’s values? Why not our own?
They don’t know what our values are from this far away in spacetime and we couldn’t explain them. If we could, we’d already have solved a large part of the problem.
More importantly, if the creator civ receives our message they will think we are assholes for doing that and be less likely to make a precommitment that benefits us, which as I’ll show is crucial.
Okay, so how about this defecting against aligned non-precommitters thing? Isn’t that barbaric? They’re people!
Yes, and for that reason we may try to go easier on them than we would on an unaligned ASI. (And anyway, since when has “they’re people!” ever stopped humans from going to war?)
The problem is, if they don’t make that precommitment, they will benefit from our having made it (as it protects them from unaligned ASI), as well as from our gift of alignment knowledge, but give nothing in return. That is, failing to make the same precommitment is itself a kind of defection, and this is just our clear signal that we will not tolerate that.
How do we benefit from their solving the alignment problem and battling future unaligned ASIs? If we create an unaligned one, we’ll all be dead anyway!
Not quite. If we create an unaligned ASI that believes it is likely in future to meet aligned ASIs that have precommitted to be hostile to it and which are in possession of significant resources, then it would rather self-modify to become more aligned than risk being annihilated—exactly mirroring the situation we want to put the alien ones in!
Thus, the more we do to increase the expected proportion of the future light cone which will be controlled by aligned precommitted ASIs, the more utility an unaligned ASI we make will be willing to sacrifice and set aside for us in the hope of avoiding their wrath.
This is also why we have to be just as willing to punish non-precommitters as we are to punish clippies. By failing to precommit they are increasing the amount of resources in future branches of the multiverse which will be controlled by unaligned ASIs (including our own), and should be held accountable for that.
So, essentially, this is a kind of acausal coordination with our cosmic neighbors who may not even exist yet, for mutual benefit?
Yup. That’s right. Freaky, eh? Though note, it’s not entirely acausal—we do have to actually send the message.
Oh, right. Wait… two more questions. One, how do you know there even are any cosmic neighbors? We see no evidence of extraterrestrial civilizations.
Well, yes, but that’s right now, and the further out in space you look the further back in time you are looking. I wouldn’t be surprised if we are among the first civilizations to ever evolve naturally in the universe. But I doubt we’re anything close to the last or the only. They may just be very sparse. As our grabby ASI expands into the universe, eventually it will meet other civilizations. It’s just a matter of when.
Won’t that sparseness impact the effectiveness of this method?
Yes, somewhat, but it depends on how our ASI discounts time. If it is concerned with events occurring in the distant future, then even if it thinks the density of civilizations capable of producing ASI is low in the universe (or more accurately, in the 3D slice of 4D spacetime on the surface of our ASI’s expansion bubble), then the amount of resources they might have control over, and thus their impact on its utility function, will still be considerable.
“Resources controlled” is roughly proportional to the cube of the length of time since the other ASI was created, if we assume spherical expansion at as close to the speed of light as possible. If the probability of winning a war is assumed directly proportional to resources (I don’t know if this is exactly true but let’s say that for now), then the odds are , for t the length of time our ASI has existed when they meet and u the length of time theirs has.
If we helped them build the ASI in the first place with our message, the odds will be greatly in favor of our ASI, as they will not have had much time to expand before it arrives (how much time they have had depends entirely on the speed of its own expansion wave, which I do not know how to predict, though I expect all ASIs in the universe will likely have a similar speed once they max out the propulsion tech tree), but it will nonetheless be willing to sacrifice of its own resources (and thus current utility) by modifying its utility function to become more aligned to our values in order to the eliminate the risk of future annihilation.
The fun thing is that only the ratio of cubes matters, not the absolute time lengths themselves—except to the extent that our ASI exponentially discounts future rewards. We should try to calibrate its discounting rate, if we can, to be as slow as possible to offset the possibility that its first meeting with alien ASI that might affect its choices now will be millions or even billions of years in the future.
Note of course that as we have no idea exactly how far away in spacetime the various ASI-making civilizations it will contact are, it will probably use some kind of simulation to average over all possible civilizations, and their choices about how to react to the precommitment message, and how aligned their ASI ends up being, and their interactions with one another, in order to determine how to react to this.
I probably will have to make some kind of simulation myself just to see if this kind of precommitment scenario has negative second-order effects I did not predict due to interactions between civilizations.
All this sounds very reasonable. But there’s still the second of the two questions I mentioned earlier. And it’s kinda the most important.
Oh, sure, ask away.
HOW THE HECK DO YOU PLAN ON SENDING THE MESSAGE??? With current technology the longest range at which radio signals emitted by Earth are expected to remain legible is a few hundred light years—not millions or billions! Furthermore, how would you encode this complex message so that aliens could understand it, even if you could actually send the signal?
Technically, that’s two questions! But I’ll answer the second one first.
Use a starting message similar to that of CosmicOS to define an interpreter for a programming language rooted in lambda calculus, and… somehow… translate into that language the concepts of AI, and alignment, and values, and decision theory, and precommitments, and and and...
We could, uh. Use the LHC as a big radio antenna?? Make a huge mirror to orbit the sun and rotate to make the sun appear to blink or flash from some angles?? Seed the universe with extremophile bacteria in whose DNA the message is encoded in such a way that it cannot survive if even one base pair mutates?? (Gosh, that one wouldn’t even reach the speed of light...)
Tbh I was so busy deciding whether we should, that I didn’t stop to think about whether we could… this may actually require that we already have an aligned ASI in order to do it… weary emoji...
Well… at least if we manage as global humanity to precommit this way even without being able to send the message, it’s evidence for our ASI that other civilizations in the universe might converge on the same idea. That’s not much, but it’s something… right?
Oh, also, I suspect that this argument could be repurposed to work entirely on Earth, with “civilization” replaced by “AGI research organization”. That possible line of investigation deserves further thought, as we already have the necessary technology to communicate with them!
(If anyone has suggestions on how to make this actually feasible, or concerns about the logic of it that I didn’t come up with, please let me know!)