Could Roko’s basilisk acausally bargain with a paperclip maximizer?

Christopher King13 Mar 2023 18:21 UTC

1 point

AI Risk Pascal's Mugging Acausal Trade Embedded Agency Roko's Basilisk Values handshakes AI

The idea of Roko’s basilisk is that it’s a friendly AI that acausally blackmails humans into working on alignment.

This was judged to not be effective because humans were too dumb to acausally bargain and there are acausal defenses to blackmail.

However, what if instead Roko’s basilisk acausually bargained with unaligned AIs, like paperclip maximizers?

In particular, we imagine that Roko’s basilisk would simulate many AI’s. If they spare humans, then the basilisk will devote some of the light cone to maximizing the simulated AI’s utility.

Now, the paperclip maximizer could reason acausally (EDIT: I’m still a bit fuzzy on this due to the simulation component, but Radford Neal is saying this argument should also work for a causal decision theorist!):

I might be in a simulation of a more powerful AI.
Roko’s basilisk is a slightly more likely candidate, since humans have at least a small bias towards building friendly AI v.s. any other specific utility maximizer.
Therefore, I ought to spare the humans and their solar system while turning the rest of the universe into paperclips. The gain from Roko’s basilisk liking me is much greater than a single solar system’s worth of paperclips.

A problem I see though is that if a large number of AIs comply, Roko’s basilisk might not have enough “universe” to acausally appease them all. This is on top of the fact that it’s already fighting the inductive bias against simulation and the potentially low probability that humans solve alignment 🤔.

How should Roko’s basilisk be designed so as to acausally save humanity? (Perhaps it should focus on the most likely “counterfactual” unaligned AIs?)

Christopher King13 Mar 2023 18:21 UTC

1 point

8 comments1 min readLW link

AI Risk Pascal's Mugging Acausal Trade Embedded Agency Roko's Basilisk Values handshakes AI

Dagon 14 Mar 2023 17:50 UTC
2 points
−1
Like all Pascal-wager-like arguments, it’s extremely sensitive to relative sizes of low-evidence very-small-absolute-magnitude probability estimates. My expectation is that it will always be an error to give significant weight to truly acausal (as opposed to distantly-causal) considerations in real actions in our universe.
cousin_it 13 Mar 2023 18:40 UTC
2 points
2
This seems similar to Rolf Nelson’s AI deterrence idea from 2007.
- Christopher King 13 Mar 2023 18:57 UTC
  1 point
  0
  Parent
  Hmm, yeah basically the same.
  
  That post doesn’t seem to recognize the “basilisk” nature of it though.
  
  If this post is correct, humans have a very strong casual incentive to create this version of Roko’s basilisk (separate from that of creating friendly AI).
  
  That’s because the more likely it is to be created, the more bargaining power it will have, which directly translates into how much of the universe the paperclip maximizer would let humans have.
  
  Here is a comparison between working on a CDT-based FAI v.s. this Roko’s basilisk:
  - If they get created, the CDT-based work is slightly better because it gives us 100% of the universe, instead of bargaining parts of it away.
  - If the paperclip maximizer gets created, work on the CDT-based one gives no benefit. Work on the Roko’s basilisk does translate into a direct benefit.
  Notice that this does not rely on any humans actually participating in the acasual bargain. They simply influenced one.
Adrien Sicart 5 May 2023 19:16 UTC
1 point
0
Hypothesis:
Basilisk could give a virus to any complex enough Turing Machine, that proves Basilisk’s Wager is either:
- a clear mutualistic win-win with the Basilisks (Hive)
- or a “you will need to waste all your resources trying to avoid our traps”
Radford Neal 13 Mar 2023 18:27 UTC
1 point
0
What’s supposed to be “acausal” about this? Your three bullet points seem to put forward a completely causal argument.
- Christopher King 13 Mar 2023 18:41 UTC
  1 point
  0
  Parent
  It’s quite shaky, but my understanding is that a casual decision theorist would only care about reward “inside the simulation”, not the outside real world paperclips.
  
  I suppose Roko’s basilisk could just simulate maximum reward, which would also convince a casual decision theorist.
  
  However, my brain kind of just defaults to “casual decision theory + simulation shenanigans = acasual shenanigans in-disguise”. If that’s incorrect (at least in this case), I can make an edit.
  - Radford Neal 13 Mar 2023 19:02 UTC
    2 points
    1
    Parent
    I can’t say for sure what people who believe in acausal decision theory would say, but it looks to me like a causal argument. If I understand the scenario as you intend, we’re talking about real paperclips, either directly made by a real paperclip maximizer, or by Roko’s basilisk as a reward for the simulated paperclip maximizer sparing humans. Both real and simulated paperclip maximizers are presumably trying to maximize real paperclips. It seems to work causally.
    Now, the decision of Roko’s basilisk to set up this scenario does seem to make sense only in the framework of acausal decision theory. But you say that the paperclip maximizer’s reasoning is acausal, which it doesn’t seem to be. The paperclip maximizer’s reasoning does presume, as a factual matter, that a Roko’s basilisk that uses acausal decision theory is likely to exist, but believing that doesn’t require that one accept acausal decision theory as being valid.
    What links here?
    Could Roko’s basilisk acausally bargain with a paperclip maximizer? by Christopher King (13 Mar 2023 18:21 UTC; 1 point)
    - Christopher King 13 Mar 2023 19:09 UTC
      1 point
      0
      Parent
      Hmm yeah, I thinking you’re right. I have edited the post!