MSRayne comments on Mitigating the damage from unaligned ASI by cooperating with aliens that don’t exist yet

MSRayne 21 Jun 2022 18:44 UTC
1 point
0
Thanks for the reply! I knew this idea was flawed somehow lol, because I’m not the most rigorous thinker, but it’s been bugging me for days and it was either write it up or try to write a perfect simulation and crash and burn due to feature creep, so I did the former.
We had no idea whatsoever how to put that sort of precommitment into our AI.
I suppose I should have said they ought to make a reasonable attempt. Attempting and failing should be enough to make you worth cooperating with.
I also strongly suspect that alien life is rare enough for pre-AGI to pre-AGI communication to be unlikely.
Oof! Somehow I didn’t even think of that. A simulation such as I was thinking of writing would probably have shown me that and I would have facepalmed; now I get to facepalm in advance!
The ability to mutually inspect each others source code is not an assumption you want to make.
Isn’t this assumption the basis of superrationality though? That would be a useless concept if it wasn’t possible for AGIs to prove things about their own reasoning to one another.
In game theory terms, this is a hawk strategy. It only works if the other side backs down.
Good point. I didn’t think of it, but there could be an alien clippy somewhere already expanding in our direction and this sort of message would doom us to be unable to compromise with it and instead get totally annihilated. Another oof...
And now you are blending the abstract logic of TDT with the approximation that is the human intuitive emotional response.
That’s because I was talking about the naturally evolved alien civilization at that point rather than the AGI they create. Assuming I’m right that these tend to be highly social species, they probably have emotional reactions vaguely like our own, so “think xyz person is an asshole” is a sort of thing they’d do, and they’d have a predictably negative reaction to that regardless of the rational response, the same way humans would.
Given all this: do you think something vaguely like this idea is salvageable? Is there some story where we communicate something to other civilizations, and it somehow increases our chances of survival now, which would seem plausible to you?
Note that transmitting information about alignment doesn’t seem to me like it would be harmful; it might not be helpful since, as you say, it would almost certainly only be ASIs that even pick it up; but on the off chance that one biont civilization gets the info, assuming we could transmit that far, it might be worth the cost? I’m not sure.
- Donald Hobson 22 Jun 2022 0:35 UTC
  2 points
  0
  Parent
  I suppose I should have said they ought to make a reasonable attempt. Attempting and failing should be enough to make you worth cooperating with.
  Even if that attempt has ~0% chance of working, and a good chance of making the AI unaligned?
  Isn’t this assumption the basis of superrationality though? That would be a useless concept if it wasn’t possible for AGIs to prove things about their own reasoning to one another.
  Physisists often assume friction-less spheres in a vacuum. Its not that other things don’t exist, just that the physisist isn’t studying them at the moment. Superrationality explains how agents should behave with mutual knowledge of each others source code. Is there a more general theory, for how agents should behave when they have some limited evidence about each others source code? Such theory isn’t well understood yet. It isn’t the assumption that all agents know each others source code (which is blatantly false in general, whether or not it is true between superintelligences able to exchange nanotech spaceprobes. )Its just the decision to study agents that know each others source code as an interesting special case.
  That’s because I was talking about the naturally evolved alien civilization at that point rather than the AGI they create. Assuming I’m right that these tend to be highly social species, they probably have emotional reactions vaguely like our own, so “think xyz person is an asshole” is a sort of thing they’d do, and they’d have a predictably negative reaction to that regardless of the rational response, the same way humans would.
  The human emotional response vaguely resembles TDT type reasoning. I would expect alien evolved responses to resemble TDT about as much, in a totally different direction. In the sense that once you know TDT, learning about humans tells you nothing about aliens. Evolution produces somewhat inaccurate maps of the TDT territory. I don’t expect the same inaccuracies to appear on both maps.
  Given all this: do you think something vaguely like this idea is salvageable? Is there some story where we communicate something to other civilizations, and it somehow increases our chances of survival now, which would seem plausible to you?
  I don’t know. I mean any story where we receive a signal from aliens, that signal could well be helpful, or harmful.
  We could just broadcast our values into space, and hope the aliens are nice. (Knowing full well that the signals would also help evil aliens be evil.)