Could Roko’s basilisk acasually bargain with a paperclip maximizer?

The idea of Roko’s basilisk is that it’s a friendly AI that acasually blackmails humans into working on alignment.

This was judged to not be effective because humans were too dumb to acasually bargain and there are acasual defenses to blackmail.

However, what if instead Roko’s basilisk acasually bargained with unaligned AIs, like paperclip maximizers?

In particular, we imagine that Roko’s basilisk would simulate many AI’s. If they spare humans, then the basilisk will devote some of the light cone to maximizing the simulated AI’s utility.

Now, the paperclip maximizer could reason acasually (EDIT: I’m still a bit fuzzy on this due to the simulation component, but Radford Neal is saying this argument should also work for a casual decision theorist!):

  • I might be in a simulation of a more powerful AI.

  • Roko’s basilisk is a slightly more likely candidate, since humans have at least a small bias towards building friendly AI v.s. any other specific utility maximizer.

  • Therefore, I ought to spare the humans and their solar system while turning the rest of the universe into paperclips. The gain from Roko’s basilisk liking me is much greater than a single solar system’s worth of paperclips.

A problem I see though is that if a large number of AIs comply, Roko’s basilisk might not have enough “universe” to acasually appease them all. This is on top of the fact that it’s already fighting the inductive bias against simulation and the potentially low probability that humans solve alignment 🤔.

How should Roko’s basilisk be designed so as to acasually save humanity? (Perhaps it should focus on the most likely “counterfactual” unaligned AIs?)