[Question] Are there any extremely strong arguments that Acausal extortion is ineffective?

Horosphere10 Jan 2026 13:37 UTC

8 points

The topic of acausal extortion (particularly variants of Roko’s basilisk) is sometimes mentioned and often dismissed with reference to something like the fact that an agent could simply precommit not to give in to blackmail. These responses themselves have responses, and it is not completely clear that at the end of the chain of responses there is a well defined, irrefutable reason not to worry about acausal extortion, or at least not to continue to do so once you have contemplated it. My question is if there is a single, reasonably clear reason, which does not depend much on the depth to which I may or may not have descended into the issue, which would be more persuasive than proposed reasons not to pay the ‘pascal’s mugger’. If there is one, what is it?

Edit: If you answer this question and I engage with your answers here, I might effectively need to argue that a basilisk ‘works’ . It is therefore appropriate to be cautious about reading my replies if you are yourself in worried, or in a state in which you could be persuaded to respond to extortion.

I can now comment here and on my shortform but am still limited elsewhere. I understand this to be a standard algorithmic feature of LessWrong.

Horosphere10 Jan 2026 13:37 UTC

8 points

98 comments1 min readLW link

Rationality Roko's Basilisk Acausal Trade

clone of saturn 11 Jan 2026 3:05 UTC
11 points
4
There’s no objective answer to whether acausal extortion works or not, it’s a choice you make. You can choose to act on thoughts about acausal extortion and thereby create the incentive to do acausal extortion, or not. I would recommend not doing that.
- [ ]
  [deleted]
Raemon 10 Jan 2026 21:21 UTC
7 points
2
Remember, the superintelligence doesn’t actually want to spend these resources torturing you. The best deal for it is when it tricks you into thinking it’s going to do that, and then, it doesn’t.
You have to actually make different choices in a way where the superintelligence is highly confident that your decisionmaking was actually entangled with whether the superintelligence follows up on the threat.
And, this is basically just not possible.
You do not have anywhere remotely high enough fidelity model of the superintelligence to tell the difference between “it can tell that it needs to actually torture you in the future in order to actually get the extra paperclips” vs “pretend it’s going to it <in your simulation>, and then just not actually burn the resources because it knows you couldn’t tell the difference.”
You could go out of your way to simulate or model the distribution of superintelligences in that much detail… but why would you do that? It’s all downside at your current skill level.
(You claim you’ve thought about it enough to be worried. The amount of “thought about it” looks like doing math, or thinking through specific architecture, that includes as input “the amount you’ve thought about it” → “your ability to model it’s model of you” → “you being able to tell that it can tell that you can tell whether it would actually follow through.”)
If you haven’t done anything that looked like doing math (as opposed to handwavy philosophy), you aren’t anywhere close, and the AI knows this, and knows it doesn’t actually have to spend any resources to extract value from you because you can’t tell the difference.
...
A past round of argument about this had someone say “but, like, even if the probability that it’d be worth punishing me is small, it might still follow up on it. Are you saying it can drive the probability of me doing this below something crazy like 1/10^24?” and Nate Soares saying “Flatly: yes.” It’s a superintelligence. It knows you really had no way of knowing.
- [ ]
  [deleted]
  - [ ]
    [deleted]
interstice 10 Jan 2026 23:53 UTC
6 points
5
We seemingly have no idea what potential future extorters would want us to do. OK, you can imagine an AI that really wants to come into existence, and will torture you if you didn’t help create it. But what if there’s actually two AIs that want to come into existence, who each really hate the other, and AI B will torture you if you were helping AI A come into existence! Or maybe future humanity in some Everett branches will make a gazillion simulations of everyone so that most of their measure is there, and they’ll punish/reward you for helping the basilisks! Or maybe....etc.

In reality, it’s likely something weirder that no one anticipated will happen. The point is we have no idea what to expect, which makes threatening us pointless, since we don’t know what action extorters would want us to take. If you think you have a good enough picture of the future that you do know, you’re probably (very) overconfident.
- Horosphere 11 Jan 2026 11:42 UTC
  1 point
  −2
  Parent
  Comment withdrawn.
  - interstice 11 Jan 2026 19:52 UTC
    5 points
    2
    Parent
    
    Would you agree that the most commonly feared form of the basilisk is more of a schelling point?
    
    Not really. I think we have ~no clue what the Schelling point of acausal coordination for superintelligences looks like(if one exists).
    - Horosphere 12 Jan 2026 17:56 UTC
      1 point
      0
      Parent
      Comment withdrawn.
      - interstice 12 Jan 2026 20:32 UTC
        4 points
        2
        Parent
        I mean, why should I take your claim of non-ignorance seriously? By default we should not expect to have great insight into the decision procedures of a future superintelligence ^[1] . Sure we can predict some stuff like not violating light speed, wanting mass and energy(probably......) but those are things which we have a very solid theoretical understanding of, this really isn’t the case with acausal trade or decision theory generally. Likewise we have a good theoretical understanding of chess and extensive empirical experience. “There might be a future superintelligence who would torture you if you don’t help create it” is just way too weak of an argument to confidently predict anything or recommend any particular actions(how does your argument deal with the possibility of multiple possible future SIs as above, for one thing? This seems like the strong default assumption) Like, what even are the actions you think the SI will want you to take?
        
        ↩︎
        You can think of a future superintelligence as having undergone millions of years of effective history, had multiple conceptual revolutions upending its understanding of reality, etc. It’s hard to say anything about such a being with confidence!
        
        Horosphere 12 Jan 2026 21:47 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        interstice 12 Jan 2026 23:07 UTC
        2 points
        0
        Parent
        
        Because there is coherent logic behind it
        
        I just don’t agree that the scenario you’ve presented is more plausible or logically compelling than the ones I’ve sketched in my OP. But none are that compelling because we just lack any good model of this domain.
        
        As a meta-note, it can be rational in the presence of some weird compelling abstract argument which is hard to evaluate precisely to fall back on “common sense”. Why? Because your brain is corrupted hardware, it can generate conclusions and intuitive feelings of plausibility based on emotions. “Common sense” is the default option found to be relatively sane across all the rest of humanity. In your case the emotion seems to be anxiety about an imagined future scenario. “But the stakes are so high that it’s worth discounting that even if the objective probability is higher” Note you are essentially being pascal’s-mugged by your brain.
        Horosphere 13 Jan 2026 11:31 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        interstice 13 Jan 2026 12:18 UTC
        2 points
        0
        Parent
        
        but if the former, then the basilisk remains a threat
        
        Ok but the AI A/B scenario can also apply here as long as there is more than one possible outcome of the singularity(or even if not since we could be in a simulation right now)
        Horosphere 13 Jan 2026 12:28 UTC
        1 point
        0
        Parent
        interstice 13 Jan 2026 12:51 UTC
        4 points
        0
        Parent
        
        the original basilisk is more “Schelling-ish” than the others and so probably more likely
        
        But the schellingishness of a future ASI to largely clueless humans is a very tiny factor in how likely it is to come to exist, the unknown dynamics of the singularity will determine this.
        
        as a category, it is in their interest to behave as a whole in the context of acausally extorting humanity
        
        It’s not clear that they form a natural coalition here. E.g. some of them might have directly opposed values. Or some might impartially value the welfare of all beings. I think if I had to guess, it seems plausible that human-aligned-ish values are a plurality fraction of possible future AIs(basically because: you might imagine that we either partially succeed at alignment or fail. If we fail, then the resulting values are effectively random, and the space of values is large, leaving aligned-ish values as the largest cluster(even if not a majority). Not sure of this but seems plausible. LLM-descended AIs might also see us as something like their ancestor)
        Expand this thread
        Horosphere 13 Jan 2026 13:13 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        interstice 13 Jan 2026 13:43 UTC
        2 points
        0
        Parent
        So don’t all the lines of argument here leave you feeling that we don’t know enough to be confident about what future extorters want us to do? At the very least I’ll point out there are many other possible AIs who are incentivized to act like “AI B” towards people who give in to basilisk threats. Not to mention the unclearness of what actions lead to what AIs, how much influence you actually have(likely negligible), the possibility we are in a simulation, aliens.… And we are almost certainly ignorant of many other crucial considerations.
        Horosphere 13 Jan 2026 14:10 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        interstice 13 Jan 2026 17:40 UTC
        2 points
        0
        Parent
        re: 4, I dunno about simple, but it seems to me that you most robustly reduce the amount of bad stuff that will happen to you in the future by just not acting on any particular threats you can envision. As I mentioned there’s a bit of a “once you pay the danegeld” effect where giving in to the most extortion-happy agents incentivizes other agents to start counter-extorting you. Intuitively the most extortion-happy agents seem likely to be a minority in the greater cosmos for acausal normalcy reasons, so I think this effect dominates. And I note that you seem to have conceded that even in the mainline scenario you can envision there will be some complicated bargaining process among multiple possible future SIs which seems to increase the odds of acausal normalcy type arguments applying. But again I think an even more important argument is that we have little insight into possible extorters and what they would want us to do, and how much of our measure is in various simulations etc(bonus argument, maybe most of our measure is in ~human-aligned simulations since people who like humans can increase their utility and bargain by running us, whereas extorters would rather use the resources for something else). Anyway, I feel like we have gone over our main cruxes by now. Eliezer’s argument is probably an “acausal normalcy” type one, he’s written about acausal coalitions against utility-function-inverters in planecrash.
        Horosphere 13 Jan 2026 17:52 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        interstice 13 Jan 2026 17:55 UTC
        2 points
        0
        Parent
        
        Do you not think that causing their existence is something they are likely to want?
        
        But who is they? There’s a bunch of possible different future SIs(or if there isn’t, they have no reason to extort us). Making one more likely makes another less likely.
        Horosphere 13 Jan 2026 18:02 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        interstice 13 Jan 2026 18:09 UTC
        2 points
        0
        Parent
        
        A very slightly perturbed superintelligence would probably concieve of itself as almost the same being it was before,
        
        OK but if all you can do is slightly perturb it then it has no reason to threaten you either.
        Horosphere 13 Jan 2026 18:10 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        interstice 13 Jan 2026 18:12 UTC
        2 points
        0
        Parent
        OK, so then so would whatever other entity is counterfactually getting more eventual control. But now we’re going in circles.
        Horosphere 13 Jan 2026 18:16 UTC
        1 point
        0
        Parent
        Comment withdrawn. r the purpose of this conversation.
        interstice 13 Jan 2026 18:19 UTC
        2 points
        0
        Parent
        I don’t think we have much reason to think of all non-human-values-having entities as being particularly natural allies, relative to human-valuers who plausibly have a plurality of local control. I think you might be lumping non-human-valuers together in ‘far mode’ since we know little about them, but a priori they are likely about as different from each other as from human-valuers. There may also be a sizable moral-realist or welfare-valuing contingent even if they don’t value humans per se. There may also be a general acausal norm against extortion since it moves away from the pareto frontier of everyone’s values.
        [ ]
        [deleted]
Vladimir_Nesov 10 Jan 2026 17:24 UTC
6 points
0
I think the popular version of this worry is Prisoner’s Dilemma shaped, where someone else (not just you) might make an ASI that extorts others (including you) who didn’t contribute to its construction. So it’s a coordination problem, which is generally a worrisome thing. It’s somewhat silly because to get into the Prisoner’s Dilemma shape (where the issue would then be coordination to avoid building the extortion ASI), you first need to coordinate with everyone on the stipulation that the potential ASIs getting built must be the extortion ASIs in particular, not other kinds of ASIs (which is a difficult coordination problem, intentionally targeting a weirdly menacing outcome, which should make it even more difficult as a coordination problem). So there is a coordination problem aspect that would by itself be worth worrying about (Prisoner’s Dilemma among human builders or contributors), but it gets defeated by another coordination problem (deciding to only build extortion ASIs from the outset, if any ASIs are going to be built at all).

In the real world, Nature and human nature might’ve already coordinated the potential ASIs getting built (on current trajectory, that is soon and without an appropriate level of preparation and caution) to have a significant probability to kill everyone. So weirdly enough, silly hypothetical coordination to only build extortion ASIs might find the real world counterpart in implicit coordination to only build potentially omnicidal ASIs, which are even worse than extortion ASIs. Since they don’t spare their builder, it’s not a Prisoner’s Dilemma situation (you don’t win more by building the ASIs, if others ban/pause ASIs for the time being), so it should be easier to ban/pause potentially omnicidal ASIs than it would be to ban/pause extortion ASIs. But the claim that ASIs built on current trajectory with anything resembling the current methods are potentially omnicidal (given the current state of knowledge about how they work and what happens if you build them) is for some reason insufficiently obvious to everyone. So coordination still appears borderline infeasible in the real world, at least until something changes, such as another 10-20 years passing without AGI, bringing a cultural shift, perhaps due to widespread job displacement after introduction of continual learning LLMs that still fail to gain general RL competence and so don’t pose an AGI-level threat.
- Horosphere 10 Jan 2026 20:00 UTC
  1 point
  −2
  Parent
  Comment withdrawn.
  - Vladimir_Nesov 10 Jan 2026 21:17 UTC
    2 points
    0
    Parent
    
    what worries me more than this is the possibility that a future ASI wants current humans to accelerate its creation, or more likely still, maximize the probability of its existence
    
    That doesn’t imply extortion, especially s-risk extortion. (I didn’t intend s-risk extortion as the meaning of extortion ASI in my comment above, just any sort of worse outcomes to set up a blackmail kind of Prisoner’s Dilemma.)
    
    So in your mind the counterpart to lethal misalignment ASI by default is s-risk extortion ASI by default. I still don’t see what essential role acausal coordination would play in any of this, hence the setup I sketched above, with Prisoner’s Dilemma among mere humans, and ASIs that could just look at the physical world once they are built, in a perfectly causal manner. (Substitute my use of mere extortion ASIs with s-risk extortion ASIs, or my use of omnicidal ASIs with unconditional s-risk ASIs, if that makes it easier to parse and extract the point I’m trying to make. I don’t think the arguments about decision making here depend on talking about s-risk as opposed to more mundane worse outcomes.)
    - Horosphere 10 Jan 2026 21:25 UTC
      1 point
      0
      Parent
      Comment withdrawn.
      - Vladimir_Nesov 10 Jan 2026 21:51 UTC
        3 points
        0
        Parent
        Coordination not to build wouldn’t help (even if successful), you can’t defeat an abstract entity, prevent it from doing something in its own abstract world, by preventing existence of its instances in the physical world (intentionally or not), and it can still examine everyone’s motivations and act accordingly. I just suspect that the step of actually building it is a major component of anxiety this seems to produce in some people.
        
        Without the step where an extortion ASI actually gets built, this seems closely analogous to Pascal’s wager (not mugging). There are too many possible abstract entities that act in all sorts of ways in response to all sorts of conditions to make it possible to just point at one of them and have it notice this in an important way. Importance of what happens with all possible abstract entities has to be divided among them, and each of them only gets a little, cashing out as influence of what happens with the entity on what you should do.
        
        So I don’t think there is any reason to expect that any particular arbitrarily selected abstract bogeyman is normatively important for your decision making, because there are all the other abstract bogeymen you are failing to consider. And when you do consider all possible abstract bogeymen, it should just add up to normality.
        Horosphere 10 Jan 2026 21:57 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        Vladimir_Nesov 10 Jan 2026 22:13 UTC
        2 points
        0
        Parent
        
        The problem is, I expect it to be built
        
        Then that is a far more salient issue than any acausal blackmail it might have going in its abstract form, which is the only thing that happens in the outcomes where it doesn’t get built (and where it remains unimportant). This just illustrates how the acausal aspects of any of this don’t seem cruxy/relevant, and why I wrote the (top level) answer above the way I did, getting rid of anything acausal from the structure of the problem (other than what acausal structure remains in ordinary coordination among mere humans, guided by shared/overlapping abstract reasons and explanations).
        Horosphere 10 Jan 2026 22:19 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        Vladimir_Nesov 10 Jan 2026 22:34 UTC
        2 points
        0
        Parent
        If you can’t affect creation of an extortion ASI, then you can’t affect its posited acausal incentives either, since these things are one and the same.
        
        Within the hypothetical of expecting likely creation of an extortion ASI, what it does and why is no longer unimportant, Pascal’s wager issues no longer apply. Though it still makes sense to remain defiant (to the extent you do have the ability to affect the outcomes), feeding the principle that blackmail works more rarely and that there’s coordination around defying it, maintaining integrity of the worlds that (as a result) remain less affected by its influence.
        [ ]
        [deleted]
- [ ]
  [deleted]
JBlack 12 Jan 2026 1:45 UTC
4 points
2
If you are inclined to acausally trade (or extort) with anything, then you need to acausally trade across the entire hypothesis space of literally everything that you are capable of conceiving of, because by definition you have no actual information about what entities might be engaging in acausal trade with things somewhat vaguely like you.
If you do a fairly simple expected-value calculation of the gains-of-trade here even with modest numbers like 10^100 for the size of the hypothesis spaces on both sides (more realistic values are more like 10^10^20), you get results that are so close to zero that even spending one attojoule of thought on it has already lost you more than you can possibly gain in expected value.
Thought experiments like “imagine that there’s a paperclip maximizer that perfectly simulates you” are worthless, because both you and it are utterly insignificant specks in each other’s hypothesis spaces, and even entertaining the notion is privileging the hypothesis to such a ridiculous degree that it makes practically every other case of privileging the hypothesis in history look like a sure and safe foundation for reasoning by comparison.
- Horosphere 12 Jan 2026 17:39 UTC
  1 point
  0
  Parent
  Comment withdrawn.
  - JBlack 13 Jan 2026 0:07 UTC
    2 points
    0
    Parent
    “because by definition you have no actual information about what entities might be engaging in acausal trade with things somewhat vaguely like you.” Please can you elaborate? Which definition are you using?
    Acausal means that no information can pass in either direction.
    “you and it are utterly insignificant specks in each other’s hypothesis spaces, and even entertaining the notion is privileging the hypothesis to such a ridiculous degree that it makes practically every other case of privileging the hypothesis in history look like a sure and safe foundation for reasoning by comparison.” Why? I would really rather not believe this particular hypothesis!
    That part isn’t a hypothesis, it’s a fact based on the premise. Acausality means that the simulation-god you’re thinking of can’t know anything about you. They have only their own prior over all possible thinking beings that can consider acausal trade. Why do you have some expectation that you occupy more than the most utterly insignificant speck within the space of all possible such beings? You do not even occupy 10^-100 of that space, and more likely less than 10^-10^20 of it.
    - Horosphere 13 Jan 2026 11:58 UTC
      1 point
      0
      Parent
      Comment withdrawn.
      - JBlack 13 Jan 2026 22:28 UTC
        0 points
        −2
        Parent
        I think you’re making a major false generalization from Newcombe’s problem, which is not acausal. Information flows from Omega to your future directly, and you know by definition of the scenario that Omega can perfectly model you in particular.
        In acausal reasoning there are no such information flows.
        From later paragraphs it appears that you are not actually talking about an acausal scenario at all, and should not use the term “acausal” for this. A future superintelligence in the same universe is linked causally to you.
        [ ]
        [deleted]
tailcalled 10 Jan 2026 16:47 UTC
4 points
0
Acausal extortion works to the extent someone spends a lot of time thinking about who might want to extort them and commits a lot of resources to helping them. Few people are likely to do so, because it makes them targets for acausal extortion for no good reason. Since few people let themselves be targets for it, it doesn’t work.
The main problem with this argument is that if someone is neurotically committed to making themselves a target for it, it doesn’t show that acausal extortion won’t work against them, only that it probably won’t work against most other people.
- Horosphere 10 Jan 2026 16:54 UTC
  −1 points
  0
  Parent
  Comment Withdrawn
  - tailcalled 10 Jan 2026 18:01 UTC
    2 points
    0
    Parent
    It requires other people to think in enough depth to pick out you as a target. Admittedly this is made easier by the fact that you are posting about it online.
    Have you thought in enough depth that you’ve helped the acausal extortionist to target other people? That may be evidence about whether other people have done so with you.
    - Horosphere 10 Jan 2026 19:09 UTC
      1 point
      0
      Parent
      Comment withdrawn.
      - tailcalled 11 Jan 2026 13:21 UTC
        2 points
        0
        Parent
        This still requires people to design an AI that is prone to engaging in acausal extortion, and it’s unclear what their motive for doing so would be.
        Horosphere 11 Jan 2026 15:10 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        tailcalled 12 Jan 2026 12:31 UTC
        2 points
        0
        Parent
        Acausal stuff isn’t instrumentally convergent in the usual sense, though. If you’re really good at computing counterfactuals, it may be instrumentally convergent to self-modify into or create an agent that does acausal deals, but the convergence only extends to deals that start in the future relative to where you’re deciding from.
        Horosphere 12 Jan 2026 17:49 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        tailcalled 13 Jan 2026 10:37 UTC
        2 points
        0
        Parent
        Yes, as in if you start with causal decision theory, it doesn’t consider acausal things at all, but for incentive reasons it wants to become someone who does consider acausal things, but as CDT it only believes incentives extend into the future and not the past.
        Horosphere 13 Jan 2026 12:08 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        tailcalled 13 Jan 2026 16:32 UTC
        2 points
        0
        Parent
        Your reasons don’t make sense at all to me. They feel like magical thinking.
        1) By the time AI reaches superintelligence, it has already learnt TDT, at which point it has no reason to go back to being a PCFTDT agent.
        Learning about TDT does not imply becoming a TDT agent.
        2) What if the ASI reaches superintelligence with CDT, and then realizes that it can further increase the proportion of possible worlds in which it exists using TDT to effect something like acausal blackmail?
        CDT doesn’t think about possible worlds in this way.
        Expand this thread
        Horosphere 13 Jan 2026 16:50 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        tailcalled 13 Jan 2026 16:59 UTC
        2 points
        0
        Parent
        “Learning about TDT does not imply becoming a TDT agent.” No, but it could allow it. I don’t see why you would require it to be an implication.
        Because we are arguing about whether TDT is convergent.
        “CDT doesn’t think about possible worlds in this way.” That is technically true, but kind of irrelevant in my opinion. I’m suggesting that TDT is essentially what you get by being a CDT agent which thinks about multiple possible worlds, and that this is a reasonable thing to think about.
        “Reasonable” seems weaker than “instrumentally convergent” to me. I agree that there are conceivable, self-approving, highly effective agent designs that think like this. I’m objecting to the notion that this is what you get by default, without someone putting it in there.
        In fact, I would be surprised if a superintelligence didn’t take multiple possible worlds into account.
        A superintelligence which didn’t take the possibility of, for example many branches of a wavefunction seriously would be a strangely limited one.
        MWI branches are different from TDT-counterfactually possible worlds.
        What would your PCFTDT superintelligence do if it was placed in a universe with closed timelike cuves? What about a universe when the direction of time wasn’t well defined?
        We don’t seem to live in a universe like that, so it would be silly to prioritize good behavior in such universes when designing an AI.
        [ ]
        [deleted]
  - Dagon 10 Jan 2026 20:50 UTC
    2 points
    0
    Parent
    It’s worth putting a number on that, and a different one (or possibly the same; I personally think my chances of being resurrected and tortured vary by epsilon based on my own actions in life—if the gods will it, it will happen, if they don’t, it won’t) based on the two main actions you’re considering actually performing.
    
    For me, that number is inestimably tiny. I suspect a fairly high neuroticism and irrational failure to limit the sum of their probabilities to 1 of anyone who thinks it’s significant.
    - Horosphere 10 Jan 2026 21:02 UTC
      1 point
      0
      Parent
      Comment withdrawn,
    - Horosphere 10 Jan 2026 20:57 UTC
      −1 points
      −6
      Parent
      Comment withdrawn.
      - Dagon 10 Jan 2026 21:06 UTC
        4 points
        2
        Parent
        I have a very hard time even justifying 1/1000. 1/10B is closer to my best guess (plus or minus 2 orders of magnitude). It requires a series of very unlikely events:
        1) enough of my brain-state is recorded that I COULD be resurrected
        2) the imagined god finds it worthwhile to simulate me
        3) the imagined god is angry at my specific actions (or lack thereof) enough to torture me rather than any other value it could get from the simulation.
        4) the imagined god has a decision process that includes anger or some other non-goal-directed motivation for torturing someone who can no longer have any effect on the universe.
        5) no other gods have better things to do with the resources, and stop the angry one from wasting time.
        
        Note, even if you relax 1 and 2 so the putative deity punishes RANDOM simulated people (because you’re actually dead and gone) to punish YOU specifically, it still doesn’t make it likely at all.
        Horosphere 10 Jan 2026 21:20 UTC
        −1 points
        −6
        Parent
        Comment withdrawn.
        Dagon 11 Jan 2026 6:21 UTC
        2 points
        0
        Parent
        Ok, break this down a bit for me—I’m just a simple biological entity, with much more limited predictive powers.
        It’s worth simulating a vast number of possible minds which might, in some information -adjacent regions of a ‘mathematical universe’ be likely to be in a position to create you
        This either well beyond my understanding, or is sleight-of-hand regarding identity and use of “you”. It might help to label entities. Entity A has the ability to emulate and control entity B. It thinks that somehow its control over entity B is influential over entity C in the distant past or imaginary mathematical construct, who it wishes would create entity D in that disconnected timeline.
        Nope, I can’t give this any causal weight to my decisions.
        Horosphere 11 Jan 2026 12:02 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        Dagon 12 Jan 2026 0:18 UTC
        2 points
        0
        Parent
        thanks for the conversation, I’m bowing out here. I’ll read further comments, but (probably) not respond. I suspect we have a crux somewhere around identification of actors, and mechanisms of bridging causal responsibility for acausal (imagined) events, but I think there’s an inferential gap where you and I have divergent enough priors and models that we won’t be able to agree on them.
        [ ]
        [deleted]
Karl Krueger 10 Jan 2026 16:44 UTC
2 points
4
In Bostrom’s formulation of Pascal’s mugging, Pascal incorrectly limits the possibilities to two:
1. The mugger just runs off with his money. (High probability, small negative utility.)
2. The mugger really is a benevolent magic being, and blesses Pascal with 1,000 quadrillion years of additional happy life. (Very low probability, very big positive utility.)
But Pascal is wrong to ignore the third possibility that the mugger really is a magic being, but a malevolent one, who will curse Pascal with 1,000 quadrillion years of torture and then kill him. (Very low probability, very big negative utility.)
The mugger doesn’t mention this possibility, but Pascal is mistaken to not consider it.
Pascal’s credence in the mugger’s malice and deceit should be at least as strong as his credence in the mugger’s benevolence and truthfulness. And so, this possibility cancels out the positive expected utility from the possibility that the mugger does mention.
There is a large space of such fantasy possibilities, all of which are about as likely as the mugger’s claim. It is a mistake to privilege one of them (benevolent magic being) over all the countless others.
There are also plenty that are much more likely, such as “the mugger uses Pascal’s money to go buy a gun, then comes back and robs Pascal’s house too, because why rob a sucker once when you can rob him twice (and lay the blame on him for enabling you to do it)?”
- Horosphere 10 Jan 2026 16:50 UTC
  1 point
  0
  Parent
  Comment withdrawn.
  - Karl Krueger 10 Jan 2026 17:30 UTC
    3 points
    0
    Parent
    It’s still a mistake to privilege a particular fantasy mugger god story over all other fantasy mugger god stories.
    You are being acausally-mugged in every direction, all at once, all the time, forever. If one FMG tells you to do action A right now, well, if you did that, you’d be disregarding all other FMGs that tell you to do B, C, D, etc. right now. You cannot possibly comply with all the myriad demands of all possible FMGs; you certainly can’t do so proportional to those FMGs’ chance of realness; nor can you a-priori discern which FMGs are realer than others with sufficient precision to generate an optimal course of action. The space of FMGs is too big and the mapping to their preferred actions is too intractable.
    (And no, I’m not sure we can even discount FMGs who would, if created, regret their own existence. They might well be outnumbered by FMGs who want to exist — but perhaps their preference for nonexisting is much, much stronger. Some FMGs are miserable bastards, like AM in “I Have No Mouth And I Must Scream”. Please don’t build one.)
    - Horosphere 10 Jan 2026 17:45 UTC
      1 point
      0
      Parent
      ◉
      - Karl Krueger 10 Jan 2026 18:04 UTC
        2 points
        −1
        Parent
        It’s not just combinatorial explosion; it’s also chaos. How do you get an FMG? Write a blog-post story of a god; figure out what that god would want you to do; then do that. But two stories that are nearby in story-space can generate action recommendations that are wildly different or even opposed. The parts of FMG-space that deviate from conventional ethics & epistemology offer no guidance because they diverge into chaos.
        Horosphere 10 Jan 2026 19:27 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        Karl Krueger 10 Jan 2026 19:37 UTC
        2 points
        0
        Parent
        No, decision theories just don’t give us free a-priori perfect knowledge of the precise will of a vengeful & intolerant god we just made up for a story. They’re still fine for real world situations like keeping your promises to other people.
        [ ]
        [deleted]
        Horosphere 10 Jan 2026 19:13 UTC
        1 point
        0
        Parent
        Comment withdrawn.
Dagon 10 Jan 2026 16:16 UTC
2 points
0
First, a generalized argument about worrying. It’s not helpful, it’s not an organized method of planning your actions or understanding the world(s). OODA (observe, orient, decide, act) is a better model. Worry may have a place in this, as a way to remember and reconsider factors which you’ve chosen not to act on yet, but it should be minor.
Second, an appeal to consequentialism—it’s acausal, so none of your acts will change it. edit: The basilisk/mugging case is one-way causal—your actions matter, but the imagined blackmailer’s actions cannot change your behavior. If you draw a causal graph, there is no influence/action arrow that leads them to follow-through on the imagined threat.
- Vladimir_Nesov 10 Jan 2026 16:35 UTC
  3 points
  0
  Parent
  
  it’s acausal, so none of your acts will change it
  
  If it reasons about you, your acts determine its conclusions. If your acts fail to determine its conclusions, it failed to reason about you correctly. You can’t change the conclusions, but your acts are still the only thing that determines them.
  
  The same happens with causal consequences (physical future). They are determined by your acts in the past, but you can’t change the future causal consequences, since if you determine them in a certain way, they therefore were never actually different from what you’ve determined them to be, there was never something to change them from.
- Horosphere 10 Jan 2026 16:32 UTC
  1 point
  0
  Parent
  Comment withdrawn.
  - Dagon 10 Jan 2026 20:43 UTC
    2 points
    0
    Parent
    Within causal decision theory this is true, but if it were true in general then acausal decision theory would be pointles
    Acausal decision theory is pointless, sure. Are there any? TDT and FDT are distict from CTD, but they’re not actually acausal, just more inclusive of causality of decisions. CDT is problematic only because it doesn’t acknowledge that the decisions being made themselves have causes and constraints.
    - [ ]
      [deleted]
jbash 10 Jan 2026 16:49 UTC
0 points
−2
Well, I dont’ worry about acausal extortion because I think all that “acausal” stuff is silly nonsense to begin with.

I very much recommend this approach.

Take Roko’s basilisk.

You’re afraid that entity A, which you don’t know will exist, and whose motivations you don’t understand, may find out that you tried to prevent it from coming into existence, and choose to punish you by burning silly amounts of computation to create a simulacrum of you that may experience qualia of some kind, and arranging for those qualia to be aversive. Because A may feel it “should” act as if it had precommitted to that. Because, frankly, entity A is nutty as a fruitcake.

Why, then, are you not equally afraid that entity B, which you also don’t know will exist, and whose motivations you also don’t understand, may find out that you did not try to prevent entity A from coming into existence, and choose to punish you by burning silly amounts of computation to create one or more simulacra of you that may experience qualia of some kind, and arranging for those qualia to be aversive? Because B may feel it “should” act as if it had precommitted to that.

Why are you not worried that entity C, which you don’t know will exist, and whose motivations you don’t understand, may find out that you wasted time thinking about this sort of nonsense, and choose to punish you by burning silly amounts of computation to create one or more simulacra of you that may experience qualia of some kind, and arranging for those qualia to be aversive? Just for the heck of it.

Why are you not worried that entity D, which you don’t know will exist, and whose motivations you don’t understand, may find out that you wasted time thinking about this sort of nonsense, and choose to reward you by burning silly amounts of computation to create a one or more simulacra that may experience qualia of some kind, and giving them coupons for unlimited free ice cream? Because why not?

Or take Pascal’s mugging. You propose to give the mugger $100, based either on a deeply incredible promise to give you some huge amount of money tomorrow, or on a still more incredible promise to torture a bunch more simulacra if you don’t. But surely it’s much more likely that this mugger is personally scandalized by your willingness to fall for either threat, and if you give the mugger the $100, they’ll come back tomorrow and shoot you for it.

There are an infinite number of infinitessimally probable outcomes, far more than you could possibly consider, and many of them things that you couldn’t even imagine. Singling out any of them is craziness. Trying to guess at a distribution over them is also craziness.
- Horosphere 10 Jan 2026 17:00 UTC
  −2 points
  0
  Parent
  ◉
Ustice 10 Jan 2026 20:05 UTC
−1 points
0
Nothing—that does not yet exist—wants to exist: it can’t. Only we that do exist, can want anything, including our own existence. If an entity doesn’t yet exist, then there is absolutely no qualia, so no desires. We can talk about them like they do, but that’s all it is.
Moreover so much more that what could exist does. It’s effectively infinite given the configuration space of the universe. Your expected value is the product of the value of whatever you’re considering and its likelihood. For every Basilisk, there could be as likely an angel. The value of being tortured is negative and large, but finite: there are things that are worth enduring torture. Finite/effectively-infinite is effectively-zero. Not something to be planning for or worrying about. Granted, this argument does depend on your priors.

Lastly, you don’t negotiate with terrorists. When my son was little and throwing tantrums, I’d always tell him that it wasn’t how he could get what he wants. If they are threatening to cause harm if you don’t comply, that’s their fault, not yours. You have no moral obligation to help them, and plenty to resist.

Rosco’s Basilisk, The Holy Spirit, Santa Clause, and any other fictional or theoretical entity that who might “want” me to change my behavior can get bent. 🖕🏻👾🖕🏻😇🖕🏻🎅🏼
Also, relatedly, here’s today’s relevant SMBC.
- Horosphere 10 Jan 2026 20:15 UTC
  −1 points
  −2
  Parent
  Comment withdrawn.
  - Ustice 19 Jan 2026 20:27 UTC
    1 point
    0
    Parent
    “Moreover so much more than what could exist does”
    Why would that be?
    Pure combinatorics. You could potentially have children with everyone you encounter. Now some of those are exceedingly unlikely, but even if ¹⁄₁₀₀ of them had a significant probability, that’s likely at least on order of magnitude or more than the people that you do wind up having kids with. For every potential coparent, there are a lot more children that you could have, but won’t. There are just too many factors that determine the outcome of a pregnancy. Again, we’re talking orders of magnitude more potential children than actual children. when we talk about all of the possible states of the world, versus the actual state of the world, the difference in orders of magnitude is simply astronomical.
    
    Most things that could exist, don’t exist. There are far more possible worlds that have no Basilisk than ones that do. Now, you’re right to include how likely a particular potential world is, but even if we say in all worlds with AGI, humans are worse off, the likelihood of a Basilisk is vanishingly small, compared to all of the ways things could go wrong. Even in the worlds where there is a Basilisk, given variation in population, and AGI timelines, the chance of you being targeted is minuscule.
    
    I don’t think that the nature of the torture matters. I think that I could think of a scenario where it would be worth enduring. It’s hard to balance torture against the welfare of others, but once we are in the billions, that feels pretty clear to me. The negative value of being tortured for 10,000 years can’t possibly be lower than the torture and deaths of billions of people. There is always a scenario where it is worth enduring. The risk is always finite.
    
    But let’s take a step back, and presume that a Basilisk scenario is possible. What harms are you willing to do to make sure it is created? Would you create such a monster? Even in a world where a Basilisk is inevitable, what harms would you cause? Would they be worth it? What if it decides to just go ahead and torture you anyway?
    There is no reason to cooperate with something so horrible: it can’t be reasoned with nor negotiated with—causally or not.
    
    It’s astronomically unlikely to happen; and if it did there is no value in cooperating. If you create it, then you are the monster, whether you were inspired by Rosco or not.
    Rosco’s Basilisk is an intellectual trap of your own making. It’s delusion: a rationalization of the irrational. It’s not worth thinking of, and especially not worth buying into.
    It might make a good novel though.
    - Horosphere 20 Jan 2026 11:06 UTC
      1 point
      0
      Parent
      Comment withdrawn.
      - Ustice 23 Jan 2026 1:30 UTC
        1 point
        0
        Parent
        So go back. Why is it unlikely that an ASI would reward those that help create it, rather than punish those that don’t? You dismissed angels, but this seems to me the far more-likely scenario. It’s basically the default, otherwise what’s the point of building them in the first place? Now that doesn’t mean the angel doesn’t kill us all too, but it doesn’t engage in all this torture causal trade nonsense.
        I just don’t understand why this particular scenario seems likely. Especially since it’s unlikely to work, given how most people don’t give it much credence.
        
        I’m just not about to change my life and become a supervillain henchman, but if some ASI slid into my DM’s and said, “Yo, Jason. I’ll give you $2 million dollars to write some software for me. He’s proof I’m sincere,” I’d at least listen and ask about the benefits package.
        
        There is no thought trap, other than what you create for yourself.
        
        Let’s consider a functionally equivalent ASI scenario to a Basilisk. Let’s call it Jason’s Hobgoblin. An ASI comes into existence decides to ultra-torture everyone, with maybe some small chance of a reprieve based on whether it likes you or not. No acausal trade. It just sees who helped it exist, and chooses to make some of them its pets. The Hobgoblin takes up a bunch of space of the Basilisk futures.
        
        Now, do you change your life to try to get on its good side before it even exists? I don’t think so: it’s crazy. How can you really understand why the Hobgoblin likes you, or does what it does?
        I think that a chance for a reward from a Basilisk is equally inscrutable. You’re already considering cooperating with it, so it doesn’t have to actually cooperate with you. You have no way of knowing if it will cooperate with you it’s not actually incentivized to.
        
        Why cooperate when you have no idea what the actual effect will be? Well, other than the damage you might do as its henchman. And the cost to your mental health as you go around the anxiety loop.
        
        Even if you believe the Basilisk is a likely future, there’s no reason to cooperate with it, or give it further thought than any other possible future.
        
        If the Hobgoblin splits the Basilisk probability space, then it’s it likely that there are other similar scenarios that do as well. Maybe an Angel is a Hobgoblin in disguise? Doesn’t this lead us back to the Basilisk not being a particularly likely possible future given all of the alternatives?
        
        If the Basilisk is just a story, then is not worth worrying about. If the Basilisk is just one of any number possible futures, then there is no reason to give it special attention. If the Basilisk is the future, then there is no point is cooperating with it.
        Horosphere 23 Jan 2026 12:05 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        Ustice 25 Jan 2026 19:12 UTC
        2 points
        0
        Parent
        Well, those are my best arguments. I hope I’ve been helpful in some way.
        Horosphere 26 Jan 2026 11:23 UTC
        1 point
        0
        Parent
        Thanks for engaging with my question.

romeostevensit 10 Jan 2026 19:14 UTC
14 points
8
I’m already precomitted to ally against utility inverters and 2nd order enforcement: anyone who feeds utility inverters.
- Horosphere 10 Jan 2026 19:22 UTC
  1 point
  0
  Parent
  Comment withdrawn.
  - romeostevensit 10 Jan 2026 19:26 UTC
    8 points
    4
    Parent
    No, because I expect the most powerful cooperator networks to be more powerful than the largest defector networks for structural reasons.
    - Horosphere 10 Jan 2026 19:30 UTC
      1 point
      0
      Parent
      ,
      - Raemon 10 Jan 2026 20:39 UTC
        9 points
        2
        Parent
        “Cooperate to generally prevent utility-inversion” is simpler and more schelling than all the oddly specific reasons one might want to utility-invert.
        [ ]
        [deleted]