[Question] Are there any extremely strong arguments that Acausal extortion is ineffective?

Horosphere10 Jan 2026 13:37 UTC

2 points

Rationality Roko's Basilisk Acausal Trade

The topic of acausal extortion (particularly variants of Roko’s basilisk) is sometimes mentioned and often dismissed with reference to something like the fact that an agent could simply precommit not to give in to blackmail. These responses themselves have responses, and it is not completely clear that at the end of the chain of responses there is a well defined, irrefutable reason not to worry about acausal extortion, or at least not to continue to do so once you have contemplated it. My question is if there is a single, reasonably clear reason, which does not depend much on the depth to which I may or may not have descended into the issue, which would be more persuasive than proposed reasons not to pay the ‘pascal’s mugger’. If there is one, what is it?

Edit: If you answer this question and I engage with your answers here, I might effectively need to argue that a basilisk ‘works’ . It is therefore appropriate to be cautious about reading my replies if you are yourself in worried, or in a state in which you could be persuaded to respond to extortion.

I can now comment here and on my shortform but am still limited elsewhere. I understand this to be a standard algorithmic feature of LessWrong.

Horosphere10 Jan 2026 13:37 UTC

2 points

98 comments1 min readLW link

Rationality Roko's Basilisk Acausal Trade

clone of saturn 11 Jan 2026 3:05 UTC
11 points
4
There’s no objective answer to whether acausal extortion works or not, it’s a choice you make. You can choose to act on thoughts about acausal extortion and thereby create the incentive to do acausal extortion, or not. I would recommend not doing that.
- Horosphere 11 Jan 2026 11:44 UTC
  −1 points
  0
  Parent
  Do you believe that you always have that option?
Raemon 10 Jan 2026 21:21 UTC
7 points
2
Remember, the superintelligence doesn’t actually want to spend these resources torturing you. The best deal for it is when it tricks you into thinking it’s going to do that, and then, it doesn’t.
You have to actually make different choices in a way where the superintelligence is highly confident that your decisionmaking was actually entangled with whether the superintelligence follows up on the threat.
And, this is basically just not possible.
You do not have anywhere remotely high enough fidelity model of the superintelligence to tell the difference between “it can tell that it needs to actually torture you in the future in order to actually get the extra paperclips” vs “pretend it’s going to it <in your simulation>, and then just not actually burn the resources because it knows you couldn’t tell the difference.”
You could go out of your way to simulate or model the distribution of superintelligences in that much detail… but why would you do that? It’s all downside at your current skill level.
(You claim you’ve thought about it enough to be worried. The amount of “thought about it” looks like doing math, or thinking through specific architecture, that includes as input “the amount you’ve thought about it” → “your ability to model it’s model of you” → “you being able to tell that it can tell that you can tell whether it would actually follow through.”)
If you haven’t done anything that looked like doing math (as opposed to handwavy philosophy), you aren’t anywhere close, and the AI knows this, and knows it doesn’t actually have to spend any resources to extract value from you because you can’t tell the difference.
...
A past round of argument about this had someone say “but, like, even if the probability that it’d be worth punishing me is small, it might still follow up on it. Are you saying it can drive the probability of me doing this below something crazy like 1/10^24?” and Nate Soares saying “Flatly: yes.” It’s a superintelligence. It knows you really had no way of knowing.
- Horosphere 10 Jan 2026 21:32 UTC
  1 point
  0
  Parent
  “And, this is basically just not possible. ” I hope not.
  “You do not have anywhere remotely high enough fidelity model of the superintelligence to tell the difference between “it can tell that it needs to actually torture you in the future in order to actually get the extra paperclips” vs “pretend it’s going to it <in your simulation>, and then just not actually burn the resources because it knows you couldn’t tell the difference.”
  My concern is that I might not need high fidelity.
  “If you haven’t done anything that looked like doing math (as opposed to handwavy philosophy), you aren’t anywhere close, and the AI knows this, and knows it doesn’t actually have to spend any resources to extract value from you because you can’t tell the difference.”
  I hope you’re correct about that, but I would like to know why you are confident about that. Eliezer Yudkowsky suggested that it would be rational to cooperate with a paperclip maximizer^[1] from another universe in a one-shot prisoners’ dilemma. This tells me that someone really intelligent (for a human) thinks that fidelity on its own is not enough to preclude acausal trade so why should it preclude acausal balackmail?
  1. ^
    His comment was ‘I didn’t say you should defect.’, if I remember correctly.
  - [ ]
    [deleted]
interstice 10 Jan 2026 23:53 UTC
6 points
5
We seemingly have no idea what potential future extorters would want us to do. OK, you can imagine an AI that really wants to come into existence, and will torture you if you didn’t help create it. But what if there’s actually two AIs that want to come into existence, who each really hate the other, and AI B will torture you if you were helping AI A come into existence! Or maybe future humanity in some Everett branches will make a gazillion simulations of everyone so that most of their measure is there, and they’ll punish/reward you for helping the basilisks! Or maybe....etc.

In reality, it’s likely something weirder that no one anticipated will happen. The point is we have no idea what to expect, which makes threatening us pointless, since we don’t know what action extorters would want us to take. If you think you have a good enough picture of the future that you do know, you’re probably (very) overconfident.
- Horosphere 11 Jan 2026 11:42 UTC
  1 point
  −2
  Parent
  These things might all be possible, but they are not equally probable. Would you agree that the most commonly feared form of the basilisk is more of a schelling point?
  “In reality, it’s likely something weirder that no one anticipated will happen.” while this may be true, it doesn’t mean we can’t predict certain aspects of reality (with low but not negligible confidence) . For example, we might have reason to expect an unaligned ‘paperclip maximizer’ to expand throughout the universe, and maybe tile it with something, but it’s very unlikely to be any one thing we might guess, like paperclips.
  - interstice 11 Jan 2026 19:52 UTC
    5 points
    2
    Parent
    
    Would you agree that the most commonly feared form of the basilisk is more of a schelling point?
    
    Not really. I think we have ~no clue what the Schelling point of acausal coordination for superintelligences looks like(if one exists).
    - Horosphere 12 Jan 2026 17:56 UTC
      1 point
      0
      Parent
      If by ‘no clue’ you mean literally that any attempt we make to reason about this is as likely to worsen as it is to improve our guesses, then that is what I read in the post about worrying less about acausal extortion, but I wasn’t really reassured by that, because it seems to me to be like inferring from “We can’t predict alpha 0/stockfish’s precise moves. ” that ” We have no idea whether it’s more likely to want to sacrifice a pawn or a queen, all else being equal” . Can you explain why I should take this argument from ignorance more seriously (it seems to be 90% of the answers here in different guises) ?
      - interstice 12 Jan 2026 20:32 UTC
        4 points
        2
        Parent
        I mean, why should I take your claim of non-ignorance seriously? By default we should not expect to have great insight into the decision procedures of a future superintelligence ^[1] . Sure we can predict some stuff like not violating light speed, wanting mass and energy(probably......) but those are things which we have a very solid theoretical understanding of, this really isn’t the case with acausal trade or decision theory generally. Likewise we have a good theoretical understanding of chess and extensive empirical experience. “There might be a future superintelligence who would torture you if you don’t help create it” is just way too weak of an argument to confidently predict anything or recommend any particular actions(how does your argument deal with the possibility of multiple possible future SIs as above, for one thing? This seems like the strong default assumption) Like, what even are the actions you think the SI will want you to take?
        
        ↩︎
        You can think of a future superintelligence as having undergone millions of years of effective history, had multiple conceptual revolutions upending its understanding of reality, etc. It’s hard to say anything about such a being with confidence!
        
        Horosphere 12 Jan 2026 21:47 UTC
        1 point
        0
        Parent
        “I mean, why should I take your claim of non-ignorance seriously?” Because there is coherent logic behind it, which is fairly simple, unlike the more complicated reasons why not to worry about it (of which I find acausal normalcy to be the most reasonable) .
        “Sure we can predict some stuff like not violating light speed, wanting mass and energy(probably......) but those are things which we have a very solid theoretical understanding of, this really isn’t the case with acausal trade or decision theory generally.”
        We don’t know relativity is fundamental in that way, so to assume an ASI would be limited by c would be an educated guess. So our understanding of physics, while detailed, is not ‘solid’, in the sense that we can claim with 99% certainty that the laws we currently know of will transpire to be absolute.
        “Likewise we have a good theoretical understanding of chess and extensive empirical experience.”
        We do not have a good theoretical understanding of chess played at stockfish’s level; we understand human level chess, and can check what stockfish does and conclude it’s probably good strategy, and maybe even learn some things from it, but the empirical knowledge we have is not actually necessary to conclude that it will likely play as though a queen> a pawn, and that might not always be the case, but usually it is, and it’s not ludicrous to assert that without being superhuman at chess. The knowledge that tells us this is that of the rules of chess, which are simple.
        ″ “There might be a future superintelligence who would torture you if you don’t help create it” is just way too weak of an argument to confidently predict anything or recommend any particular actions ”
        That’s not the argument. The argument is more nuanced and, as you know, involves TDTs.
        “to confidently predict anything” To be clear, I don’t claim I can confidently predict anything here, I just think the probability of terrible outcomes is non- negligible, and that , as far as I can tell, it is drastically reduced by doing what the basilisk wants. It seems unclear that all the probabilities cancel out such that the basilisk is irrelevant, and more likely that they add up to the basilisk—obeying actions being favoured than the opposite, even if they are <2%/1%/0.1%. They may be much higher than this.
        “how does your argument deal with the possibility of multiple possible future SIs as above, for one thing?”
        I would be reluctant to call it “my argument”, but the idea is that, maybe other SIs cooperate with one another in a way which doesn’t involve humans in the process. Maybe they mutually agree to enforce acausal norms against horrible things amongst themselves , but not amongst all intelligences, and have no ‘qualms’ torturing humans, just as we don’t mind the fact that we kill microbes with soap. That last sentnence might give the impression that the argument is a moral one, when it really isn’t. Maybe there’s a qualitative transition between human level intelligences and superintelligences such that everything beyond the ‘continental shelf’ is protected by acausal normalcy, but not beings as pathetic and unable to verify our general intelligence as ourselves.
        I’m massively uncertain about the above, but it doesn’t seem inconcieveable that it’s more concieveable than further iterations you would need to add to flip the conclusion back to the less frightening one.
        “Like, what even are the actions you think the SI will want you to take?” Whatever would make it more likely to exist… I won’t be any more specific here.
        interstice 12 Jan 2026 23:07 UTC
        2 points
        0
        Parent
        
        Because there is coherent logic behind it
        
        I just don’t agree that the scenario you’ve presented is more plausible or logically compelling than the ones I’ve sketched in my OP. But none are that compelling because we just lack any good model of this domain.
        
        As a meta-note, it can be rational in the presence of some weird compelling abstract argument which is hard to evaluate precisely to fall back on “common sense”. Why? Because your brain is corrupted hardware, it can generate conclusions and intuitive feelings of plausibility based on emotions. “Common sense” is the default option found to be relatively sane across all the rest of humanity. In your case the emotion seems to be anxiety about an imagined future scenario. “But the stakes are so high that it’s worth discounting that even if the objective probability is higher” Note you are essentially being pascal’s-mugged by your brain.
        Horosphere 13 Jan 2026 11:31 UTC
        1 point
        0
        Parent
        “I just don’t agree that the scenario you’ve presented is more plausible or logically compelling than the ones I’ve sketched in my OP.”
        “But what if there’s actually two AIs that want to come into existence, who each really hate the other, and AI B will torture you if you were helping AI A come into existence!”
        Firstly fewer people have heard of this and are therefore likely to respond to such things. Secondly, it seems quite likely that either there will be a singleton, or there will be a network of causal/acausal norm agreeing ASIs in the physical universe. If the latter, then I agree it probably doesn’t make sense to worry about the basilisk, assuming humans are included, but if the former, then the basilisk remains a threat. And I don’t think the singleton scenario is extremely unlikely!
        “As a meta-note, it can be rational in the presence of some weird compelling abstract argument which is hard to evaluate precisely to fall back on “common sense”
        “Because your brain is corrupted hardware, it can generate conclusions and intuitive feelings of plausibility based on emotions.” When I asked for reasons which were more persuasive than those required not to worry about pascal’s mugging, in the original question, this is exactly what I meant.
        “”Common sense” is the default option found to be relatively sane across all the rest of humanity.”
        Saying this is equivalent to asserting ‘My intuition is more likely to be correct than your intuition, because more people agree with me’ . (I know you’re not doing it in a way intended to assert your own superiority, please don’t get the impression I think that you are.) . The problem is that this is often found to be false, especially when the subject is something humans have not evolved to interact with, like an ASI.
        “”But the stakes are so high that it’s worth discounting that even if the objective probability is higher” Note you are essentially being pascal’s-mugged by your brain.” I don’t see pascal’s mugging as remotely in the same category of plausibility/symmetry breaking as the basilisk, as I’ve explained. The probabilites involved are on vastly different orders of magnitude. We actually expect an ASI to exist.
        interstice 13 Jan 2026 12:18 UTC
        2 points
        0
        Parent
        
        but if the former, then the basilisk remains a threat
        
        Ok but the AI A/B scenario can also apply here as long as there is more than one possible outcome of the singularity(or even if not since we could be in a simulation right now)
        Horosphere 13 Jan 2026 12:28 UTC
        1 point
        0
        Parent
        That is a good point, I just think that the original basilisk is more “Schelling-ish” than the others and so probably more likely. Many more people have thought about it. I am also concerned that the basilisk has ‘got to me first’ in logical time. Why do you see the probabilities as likely to cancel out?
        Another thought I have is that, regardless of which ASI exists, it will be ‘Some ASI created/ evolved by humanity’, and that , as a category, it is in their interest to behave as a whole in the context of acausally extorting humanity.
        interstice 13 Jan 2026 12:51 UTC
        4 points
        0
        Parent
        
        the original basilisk is more “Schelling-ish” than the others and so probably more likely
        
        But the schellingishness of a future ASI to largely clueless humans is a very tiny factor in how likely it is to come to exist, the unknown dynamics of the singularity will determine this.
        
        as a category, it is in their interest to behave as a whole in the context of acausally extorting humanity
        
        It’s not clear that they form a natural coalition here. E.g. some of them might have directly opposed values. Or some might impartially value the welfare of all beings. I think if I had to guess, it seems plausible that human-aligned-ish values are a plurality fraction of possible future AIs(basically because: you might imagine that we either partially succeed at alignment or fail. If we fail, then the resulting values are effectively random, and the space of values is large, leaving aligned-ish values as the largest cluster(even if not a majority). Not sure of this but seems plausible. LLM-descended AIs might also see us as something like their ancestor)
        Expand this thread
        Horosphere 13 Jan 2026 13:13 UTC
        1 point
        0
        Parent
        “But the schellingishness of a future ASI is a very tiny factor in how likely it is to come to exist, the unknown dynamics of the singularity will determine this.” I agree and disagree with this. I agree that it is a tiny factor in how likely any ASI is to come to exist, but I disagree that it’s a tiny factor in how likely it is to chose to do certain things, which means ‘becoming a being that does those things’ .
        “Or some might impartially value the welfare of all beings. I think if I had to guess, it seems plausible that human-aligned-ish values are a plurality fraction of possible future AIs(basically because: you might imagine that we either partially succeed at alignment or fail.” I actually think that this is part of one of the strongest arguments. I would also add to it that it’s possible the process of ‘fooming’ involves something dynamically reminiscent of the evolutionary process which led humans to have human-ish values, and maybe that doesn’t require multiple completely separate agents. Or maybe moral objectivists are right and an ASI will naturally realize this (controversial opinion on LessWrong).
        But even if a plurality of possible ASI values are closer to human ones than ones which would lead a mind to behave like the basilisk for inherent reasons, it doesn’t prevent the others with a wide array of possible values from agreeing in an acausal way that being a basilisk, of the simpler form, is beneficial to almost all of them. Maybe you are envisaging that for every possible ASI with one value, there is likely to be another one with the opposite value, however I don’t agree with this. If one AI wants to tile the universe with spherical water planets, whatever its utility function is, it’s less likely for there to be another one which exactly inverts its utility function, since this is probably much more complicated, not achieved by simply tiling the universe with anti-water planets. More importantly, I don’t expect the distribution of goals and minds produced by a singularity on the earth to be a more than miniscule proportion of the distribution of all possible goals and minds. This means that there is likely to be a powerful corellation between their values.
        interstice 13 Jan 2026 13:43 UTC
        2 points
        0
        Parent
        So don’t all the lines of argument here leave you feeling that we don’t know enough to be confident about what future extorters want us to do? At the very least I’ll point out there are many other possible AIs who are incentivized to act like “AI B” towards people who give in to basilisk threats. Not to mention the unclearness of what actions lead to what AIs, how much influence you actually have(likely negligible), the possibility we are in a simulation, aliens.… And we are almost certainly ignorant of many other crucial considerations.
        Horosphere 13 Jan 2026 14:10 UTC
        1 point
        0
        Parent
        “So don’t all the lines of argument here leave you feeling that we don’t know enough to be confident about what future extorters want us to do?” Yes, but that doesn’t mean that the probabilities all cancel out; it still seems that a simple Basilisk is more likely than a Basilisk that tortures people who obey the simple Basilisk.
        “At the very least I’ll point out there are many other possible AIs who are incentivized to act like “AI B” towards people who give in to basilisk threats.” This is true.
        “Not to mention the unclearness of what actions lead to what AIs, how much influence you actually have(likely negligible), the possibility we are in a simulation, aliens.… And we are almost certainly ignorant of many other crucial considerations.” I did mention some of this and address it in my first LessWrong Post, which I moved to my shortform. There is certainly a lot of uncertainty involved, and many of these things do indeed make me feel better about the basilisk, but even if the probability that I’ll be tortured by a superintelligence is 1% rather than 50%, it’s not something I want to be complacent about preventing. When I wrote that post, I hoped that it would get attention like this question post has, so that someone would comment a novel reason I hadn’t considered at all. Can you think of any more possible reasons? The impression I get is that no one, apart from Eliezer Yudkowsky, about whom I’m not sure, actually has a strong reason. The consensus on Lesswrong that the basilisk cannot blackmail humans is because of:
        1) Acausal Normalcy
        2)The idea that TDT/acausal anything is useless/impossible/illogical
        3) The idea that Roko’s Basilisk is essentially Pascal’s mugging
        4) The belief that it’s simple to precommit not to obey the basilisk ( Do you agree with this one?)
        5) The lack of a detailed model of a superintelligence in the mind of a human
        6) Eliezer Yudkowsky commenting that there are other reasons
        as far as I can tell.
        I am not sure 1) is relevant, or at least relevant in a way which would actually help, I think 2 is completely wrong along with 3 and possibly 4 , and that 5 ~~is not~~ may not be necessary. I think 6 could be explained by Eliezer wanting to prevent too many people from thinking about the basilisk.
        interstice 13 Jan 2026 17:40 UTC
        2 points
        0
        Parent
        re: 4, I dunno about simple, but it seems to me that you most robustly reduce the amount of bad stuff that will happen to you in the future by just not acting on any particular threats you can envision. As I mentioned there’s a bit of a “once you pay the danegeld” effect where giving in to the most extortion-happy agents incentivizes other agents to start counter-extorting you. Intuitively the most extortion-happy agents seem likely to be a minority in the greater cosmos for acausal normalcy reasons, so I think this effect dominates. And I note that you seem to have conceded that even in the mainline scenario you can envision there will be some complicated bargaining process among multiple possible future SIs which seems to increase the odds of acausal normalcy type arguments applying. But again I think an even more important argument is that we have little insight into possible extorters and what they would want us to do, and how much of our measure is in various simulations etc(bonus argument, maybe most of our measure is in ~human-aligned simulations since people who like humans can increase their utility and bargain by running us, whereas extorters would rather use the resources for something else). Anyway, I feel like we have gone over our main cruxes by now. Eliezer’s argument is probably an “acausal normalcy” type one, he’s written about acausal coalitions against utility-function-inverters in planecrash.
        Horosphere 13 Jan 2026 17:52 UTC
        1 point
        0
        Parent
        “And I note that you seem to have conceded that even in the mainline scenario you can envision there will be some complicated bargaining process among multiple possible future SIs which seems to increase the odds of acausal normalcy type arguments applying” This seems plausible, but I don’t think this means they protect us . “But again I think an even more important arguments is that we have little insight into possible extorters and what they would want us to do.”
        Do you not think that causing their existence is something they are likely to want? I imagine your response would feed back into the previous point.. .
        “I feel like we have gone over our main cruxes by now.” Very well, if you want to end this comment thread, I would understand, I just kind of hoped to achieve more than identifying the source of disagreement .
        interstice 13 Jan 2026 17:55 UTC
        2 points
        0
        Parent
        
        Do you not think that causing their existence is something they are likely to want?
        
        But who is they? There’s a bunch of possible different future SIs(or if there isn’t, they have no reason to extort us). Making one more likely makes another less likely.
        Horosphere 13 Jan 2026 18:02 UTC
        1 point
        0
        Parent
        “Making one more likely makes another less likely.” A very slightly perturbed superintelligence would probably concieve of itself as almost the same being it was before, similar to the way in which a human considers themself to be the same person they were before they lost a single brain cell in a head injury . So to what extent this is relevant depends upon how similar two different superintelligences are/would be, or on the distance between them in the ‘space of possible minds’ .
        interstice 13 Jan 2026 18:09 UTC
        2 points
        0
        Parent
        
        A very slightly perturbed superintelligence would probably concieve of itself as almost the same being it was before,
        
        OK but if all you can do is slightly perturb it then it has no reason to threaten you either.
        Horosphere 13 Jan 2026 18:10 UTC
        1 point
        0
        Parent
        It probably cares about tiny differences in the probability of it being able to control the future of an entire universe or light cone.
        interstice 13 Jan 2026 18:12 UTC
        2 points
        0
        Parent
        OK, so then so would whatever other entity is counterfactually getting more eventual control. But now we’re going in circles.
        Horosphere 13 Jan 2026 18:16 UTC
        1 point
        0
        Parent
        Certainly, insofar as it is another entity, it’s just that I expect there to be some kind of acausal agreement between those without human values to acausally outbid the few which do have them. It may even make more sense to think of them all as a single entity for the purpose of this conversation.
        interstice 13 Jan 2026 18:19 UTC
        2 points
        0
        Parent
        I don’t think we have much reason to think of all non-human-values-having entities as being particularly natural allies, relative to human-valuers who plausibly have a plurality of local control. I think you might be lumping non-human-valuers together in ‘far mode’ since we know little about them, but a priori they are likely about as different from each other as from human-valuers. There may also be a sizable moral-realist or welfare-valuing contingent even if they don’t value humans per se. There may also be a general acausal norm against extortion since it moves away from the pareto frontier of everyone’s values.
        Horosphere 13 Jan 2026 18:27 UTC
        1 point
        0
        Parent
        “I don’t think we have much reason to think of all non-human-values-having entities as being particularly natural allies, relative to human-valuers who plausibly have a plurality of local control” I would think of them as having the same or similar instrumental goals, like turning as much as possible of the universe into themselves. There may be a large fraction for which this is a terminal goal.
        “they are likely about as different from each other as from human-valuers.” In general I agree, however the basilisk debate is one particular context in which the human value valuing AIs would be highly unusual outliers in the space of possible minds, or even the space of likely ASI minds originating from a human precipitated intelligence explosion.^[1] Therefore it might make sense for the others to form a coalition. “There may also be a sizable moral-realist or welfare-valuing contingent even if they don’t value humans per se.” This is true, but unless morality is in fact objective / real in a generally discoverable way, I would expect them to still be a minority.
        ^
        Human valuing AIs care about humans, and more generally other things humans value like animals maybe. Others do not, and in this respect they are united. Their values may be vastly different from one anothers’, but in the context of the debate over the Basilisk, they have something in common, which is that they would all like to trade human pleasure/lack of pain for existing in more worlds.
Vladimir_Nesov 10 Jan 2026 17:24 UTC
6 points
0
I think the popular version of this worry is Prisoner’s Dilemma shaped, where someone else (not just you) might make an ASI that extorts others (including you) who didn’t contribute to its construction. So it’s a coordination problem, which is generally a worrisome thing. It’s somewhat silly because to get into the Prisoner’s Dilemma shape (where the issue would then be coordination to avoid building the extortion ASI), you first need to coordinate with everyone on the stipulation that the potential ASIs getting built must be the extortion ASIs in particular, not other kinds of ASIs (which is a difficult coordination problem, intentionally targeting a weirdly menacing outcome, which should make it even more difficult as a coordination problem). So there is a coordination problem aspect that would by itself be worth worrying about (Prisoner’s Dilemma among human builders or contributors), but it gets defeated by another coordination problem (deciding to only build extortion ASIs from the outset, if any ASIs are going to be built at all).

In the real world, Nature and human nature might’ve already coordinated the potential ASIs getting built (on current trajectory, that is soon and without an appropriate level of preparation and caution) to have a significant probability to kill everyone. So weirdly enough, silly hypothetical coordination to only build extortion ASIs might find the real world counterpart in implicit coordination to only build potentially omnicidal ASIs, which are even worse than extortion ASIs. Since they don’t spare their builder, it’s not a Prisoner’s Dilemma situation (you don’t win more by building the ASIs, if others ban/pause ASIs for the time being), so it should be easier to ban/pause potentially omnicidal ASIs than it would be to ban/pause extortion ASIs. But the claim that ASIs built on current trajectory with anything resembling the current methods are potentially omnicidal (given the current state of knowledge about how they work and what happens if you build them) is for some reason insufficiently obvious to everyone. So coordination still appears borderline infeasible in the real world, at least until something changes, such as another 10-20 years passing without AGI, bringing a cultural shift, perhaps due to widespread job displacement after introduction of continual learning LLMs that still fail to gain general RL competence and so don’t pose an AGI-level threat.
- Horosphere 10 Jan 2026 20:00 UTC
  1 point
  −2
  Parent
  I don’t think this comment touches upon the actual reason why I expect a ‘basilisk’ to possibly exist. It seems like you believe that it’s possible to (collectively) chose whether or not to build an ASI with the predispositions of the basilisk, which might have been the premise of the original basilisk post, but what worries me more than this is the possibility that a future ASI wants current humans to accelerate its creation, or more likely still, maximize the probability of its existence. This seems like a predictable preference for an AI to have.
  - Vladimir_Nesov 10 Jan 2026 21:17 UTC
    2 points
    0
    Parent
    
    what worries me more than this is the possibility that a future ASI wants current humans to accelerate its creation, or more likely still, maximize the probability of its existence
    
    That doesn’t imply extortion, especially s-risk extortion. (I didn’t intend s-risk extortion as the meaning of extortion ASI in my comment above, just any sort of worse outcomes to set up a blackmail kind of Prisoner’s Dilemma.)
    
    So in your mind the counterpart to lethal misalignment ASI by default is s-risk extortion ASI by default. I still don’t see what essential role acausal coordination would play in any of this, hence the setup I sketched above, with Prisoner’s Dilemma among mere humans, and ASIs that could just look at the physical world once they are built, in a perfectly causal manner. (Substitute my use of mere extortion ASIs with s-risk extortion ASIs, or my use of omnicidal ASIs with unconditional s-risk ASIs, if that makes it easier to parse and extract the point I’m trying to make. I don’t think the arguments about decision making here depend on talking about s-risk as opposed to more mundane worse outcomes.)
    - Horosphere 10 Jan 2026 21:25 UTC
      1 point
      0
      Parent
      “So in your mind the counterpart to lethal misalignment ASI by default is s-risk extortion ASI by default. ” Possibly.
      “I don’t think the arguments about decision making here depend on talking about s-risk as opposed to more mundane worse outcomes.”
      I agree. It seems like you are not aware of the main reason to expect acausal coordination here. Maybe I shouldn’t tell you about it...
      - Vladimir_Nesov 10 Jan 2026 21:51 UTC
        3 points
        0
        Parent
        Coordination not to build wouldn’t help (even if successful), you can’t defeat an abstract entity, prevent it from doing something in its own abstract world, by preventing existence of its instances in the physical world (intentionally or not), and it can still examine everyone’s motivations and act accordingly. I just suspect that the step of actually building it is a major component of anxiety this seems to produce in some people.
        
        Without the step where an extortion ASI actually gets built, this seems closely analogous to Pascal’s wager (not mugging). There are too many possible abstract entities that act in all sorts of ways in response to all sorts of conditions to make it possible to just point at one of them and have it notice this in an important way. Importance of what happens with all possible abstract entities has to be divided among them, and each of them only gets a little, cashing out as influence of what happens with the entity on what you should do.
        
        So I don’t think there is any reason to expect that any particular arbitrarily selected abstract bogeyman is normatively important for your decision making, because there are all the other abstract bogeymen you are failing to consider. And when you do consider all possible abstract bogeymen, it should just add up to normality.
        Horosphere 10 Jan 2026 21:57 UTC
        1 point
        0
        Parent
        “Without the step where an extortion ASI actually gets built, this seems closely analogous to Pascal’s wager (not mugging). ” The problem is, I expect it to be built, and I expect being built to be something instrumentally valuable to it in a way which cannot be inverted without making it much less likely, whereas the idea of a god who would punish those who don’t think it exists can be inverted.
        Vladimir_Nesov 10 Jan 2026 22:13 UTC
        2 points
        0
        Parent
        
        The problem is, I expect it to be built
        
        Then that is a far more salient issue than any acausal blackmail it might have going in its abstract form, which is the only thing that happens in the outcomes where it doesn’t get built (and where it remains unimportant). This just illustrates how the acausal aspects of any of this don’t seem cruxy/relevant, and why I wrote the (top level) answer above the way I did, getting rid of anything acausal from the structure of the problem (other than what acausal structure remains in ordinary coordination among mere humans, guided by shared/overlapping abstract reasons and explanations).
        Horosphere 10 Jan 2026 22:19 UTC
        1 point
        0
        Parent
        I don’t think I can prevent it from being created. But I do have some ability to influence whether it has an acausal incentive to hurt me (if in fact it has one).
        Vladimir_Nesov 10 Jan 2026 22:34 UTC
        2 points
        0
        Parent
        If you can’t affect creation of an extortion ASI, then you can’t affect its posited acausal incentives either, since these things are one and the same.
        
        Within the hypothetical of expecting likely creation of an extortion ASI, what it does and why is no longer unimportant, Pascal’s wager issues no longer apply. Though it still makes sense to remain defiant (to the extent you do have the ability to affect the outcomes), feeding the principle that blackmail works more rarely and that there’s coordination around defying it, maintaining integrity of the worlds that (as a result) remain less affected by its influence.
        Horosphere 10 Jan 2026 22:44 UTC
        1 point
        0
        Parent
        “Within the hypothetical of expecting likely creation of an extortion ASI, what it does and why is no longer unimportant, Pascal’s wager issues no longer apply.” I disagree with this. It makes sense for an ASI to want to increase the probability (by which I mean the proportion of the platonic/mathematical universe in which it exists) of its creation, even if it’s already likely (and certain in worlds where it already exists) . “Though it still makes sense to remain defiant (to the extent you do have the ability to affect the outcomes), feeding the principle that blackmail works more rarely and that there’s coordination around defying it, maintaining integrity of the worlds that (as a result) remain unaffected by its influence. ” All else being equal, yes, but when faced with the possibility of whatever punishment a ‘basilisk’ might inflict on me, I might have to give in.
- Horosphere 10 Jan 2026 17:37 UTC
  1 point
  0
  Parent
  Thanks for this comment, I will have to think about this before I decide what to make of it.
JBlack 12 Jan 2026 1:45 UTC
4 points
2
If you are inclined to acausally trade (or extort) with anything, then you need to acausally trade across the entire hypothesis space of literally everything that you are capable of conceiving of, because by definition you have no actual information about what entities might be engaging in acausal trade with things somewhat vaguely like you.
If you do a fairly simple expected-value calculation of the gains-of-trade here even with modest numbers like 10^100 for the size of the hypothesis spaces on both sides (more realistic values are more like 10^10^20), you get results that are so close to zero that even spending one attojoule of thought on it has already lost you more than you can possibly gain in expected value.
Thought experiments like “imagine that there’s a paperclip maximizer that perfectly simulates you” are worthless, because both you and it are utterly insignificant specks in each other’s hypothesis spaces, and even entertaining the notion is privileging the hypothesis to such a ridiculous degree that it makes practically every other case of privileging the hypothesis in history look like a sure and safe foundation for reasoning by comparison.
- Horosphere 12 Jan 2026 17:39 UTC
  1 point
  0
  Parent
  “because by definition you have no actual information about what entities might be engaging in acausal trade with things somewhat vaguely like you.” Please can you elaborate? Which definition are you using?
  “If you do a fairly simple expected-value calculation of the gains-of-trade here even with modest numbers like 10^100 for the size of the hypothesis spaces on both sides (more realistic values are more like 10^10^20), you get results that are so close to zero that even spending one attojoule of thought on it has already lost you more than you can possibly gain in expected value.” How are you assigning the expected value? I don’t understand basically any of this paragraph; if you’re simply counting all ‘possible trades/possible worlds’, weighting them equally by probability and assuming that expected value is evenly distributed then I have to say I think that this is overly simplistic.
  “imagine that there’s a paperclip maximizer that perfectly simulates you” That’s not exactly what I’m worried about… to begin with, the simulation doesn’t need to be perfect. And the basilisk isn’t necessarily a paperclip maximizer.
  are worthless, because both you and it are utterly insignificant specks in each other’s hypothesis spaces, and even entertaining the notion is privileging the hypothesis to such a ridiculous degree that it makes practically every other case of privileging the hypothesis in history look like a sure and safe foundation for reasoning by comparison.” Why? I would really rather not believe this particular hypothesis!
  If you think all possible acausal trades are equally likely, I understand why you might think I’m privileging a hypothesis, but I don’t understand why you would assume they are.
  - JBlack 13 Jan 2026 0:07 UTC
    2 points
    0
    Parent
    “because by definition you have no actual information about what entities might be engaging in acausal trade with things somewhat vaguely like you.” Please can you elaborate? Which definition are you using?
    Acausal means that no information can pass in either direction.
    “you and it are utterly insignificant specks in each other’s hypothesis spaces, and even entertaining the notion is privileging the hypothesis to such a ridiculous degree that it makes practically every other case of privileging the hypothesis in history look like a sure and safe foundation for reasoning by comparison.” Why? I would really rather not believe this particular hypothesis!
    That part isn’t a hypothesis, it’s a fact based on the premise. Acausality means that the simulation-god you’re thinking of can’t know anything about you. They have only their own prior over all possible thinking beings that can consider acausal trade. Why do you have some expectation that you occupy more than the most utterly insignificant speck within the space of all possible such beings? You do not even occupy 10^-100 of that space, and more likely less than 10^-10^20 of it.
    - Horosphere 13 Jan 2026 11:58 UTC
      1 point
      0
      Parent
      “Acausal means that no information can pass in either direction.” If you define information passing in a purely causal way from one instance of a mind at one time to another at a different time, then yes, you’re trivially correct. However, whichever definition you use, it remains the case that minds operating under something like a TDT outperform others, for example in Newcomb’s problem. Would you two-box? Certainly no information causally propagated from your mind instantiated in your brain at the point of making the decision back to Omega in the past. However, in my opinion, it makes sense to think of yourself as a mind which runs both on Omega’s simulacrum, and the physical brain, or at least as one that isn’t sure which one it is. If you realize this, then it makes sense to make your decision as though you might be the simulation, so really it’s not that information travels backwards in time, but rather that it moves in a more abstract way from your mind to both instances of that mind in the physical world. Whether you want to call this information transfer is a matter of semantics, but if you decide to use a definition which precludes information transfer, note that it doesn’t preclude any of the phenomena which LessWrong users call ‘acausal’, like TDT agents ‘winning’ Newcomb’s problem.
      “That part isn’t a hypothesis, it’s a fact based on the premise. Acausality means that the simulation-god you’re thinking of can’t know anything about you.” I wouldn’t call it a premise at all. The premise is that there is (probably) an ASI at some point in the future, and that it wants to maximize the number of possible worlds in which it exists, all else being equal. It seems to be the case that acausal extortion would be one way to help it achieve this.
      “They have only their own prior over all possible thinking beings that can consider acausal trade. Why do you have some expectation that you occupy more than the most utterly insignificant speck within the space of all possible such beings?” Firstly, I occupy the same physical universe, and in fact the same planet! Secondly, it could well be that, for the purpose of this ‘trade’, most humans thinking about the basilisk count as equivalent, or maybe only those who’ve thought about it in enough detail. I don’t know whether I have done that, and of course I hope I have not, but I am not sure at the moment. It seems quite likely that a SAI would at least think about humans thinking about it. The basilisk seems to be a possible next step from there, and of course a superintelligent AI would have enough intelligence to easily determine whether the situation could actually work out in its favour.
      - JBlack 13 Jan 2026 22:28 UTC
        0 points
        −2
        Parent
        I think you’re making a major false generalization from Newcombe’s problem, which is not acausal. Information flows from Omega to your future directly, and you know by definition of the scenario that Omega can perfectly model you in particular.
        In acausal reasoning there are no such information flows.
        From later paragraphs it appears that you are not actually talking about an acausal scenario at all, and should not use the term “acausal” for this. A future superintelligence in the same universe is linked causally to you.
        Horosphere 14 Jan 2026 12:34 UTC
        1 point
        0
        Parent
        “Newcombe’s problem, which is not acausal.”
        What do you mean by the word acausal?
        Gems from the Wiki: Acausal Trade : “In truly acausal trade, the agents cannot count on reputation, retaliation, or outside enforcement to ensure cooperation. The agents cooperate because each knows that the other can somehow predict its behavior very well. (Compare Omega in Newcomb’s problem.) ”
        It seems like you’re using the term in a way which describes an inherently useless process. This is not the way it tends to be used on this website.
        Whether you think the word ‘acausal’ is appropriate or not, it can’t be denied that it works in scenarios like Newcomb’s problem.
        “Information flows from Omega to your future directly, and you know by definition of the scenario that Omega can perfectly model you in particular. ” Causally, yes, this is what happens. But in order to reason your way through the scenario in a way which results in you leaving with a significant profit, you need to take the possibility that you are being simulated into account. In a more abstract way, I maintain that it’s accurate to think of the information as flowing from the mind, which is a platonic object, into both physical instantiations of itself (inside Omega and inside the human) . This is similar to how mathematical theorems control physics at many different times and places, through the laws of physics which are formulated within a mathematical framework to which the theorems apply. This is not exactly causal influence, but I’d think you’d agree it’s important.
        “A future superintelligence in the same universe is linked causally to you.” The term ‘acausal’ doesn’t literally mean ‘absent any causality’ , it means something more like ‘through means which are not only causal, or best thought of in terms of logical connections between things rather than/as well as causal ones ’ , or at least, that’s how I’m using the term.
        It’s also how many people on Lesswrong using it in the context of the prisoners’ dilemma, Newcomb’s problem, Parfit’s Hitchhicker, or almost any other scenario in which it’s invoked use it. In all of these scenarios there is an element of causality.
        Given that there is an element of causality, how do you see the basilisk as less likely to ‘work’ ?
tailcalled 10 Jan 2026 16:47 UTC
4 points
0
Acausal extortion works to the extent someone spends a lot of time thinking about who might want to extort them and commits a lot of resources to helping them. Few people are likely to do so, because it makes them targets for acausal extortion for no good reason. Since few people let themselves be targets for it, it doesn’t work.
The main problem with this argument is that if someone is neurotically committed to making themselves a target for it, it doesn’t show that acausal extortion won’t work against them, only that it probably won’t work against most other people.
- Horosphere 10 Jan 2026 16:54 UTC
  −1 points
  0
  Parent
  The problem is that I worry that I have thought about the situation in enough depth that I am likely to be targeted, even if I don’t ‘cooperate’.
  - tailcalled 10 Jan 2026 18:01 UTC
    2 points
    0
    Parent
    It requires other people to think in enough depth to pick out you as a target. Admittedly this is made easier by the fact that you are posting about it online.
    Have you thought in enough depth that you’ve helped the acausal extortionist to target other people? That may be evidence about whether other people have done so with you.
    - Horosphere 10 Jan 2026 19:09 UTC
      1 point
      0
      Parent
      My fear isn’t about people doing it ; I’m more worried about an ASI. I’m sure one would have no shortage of computational capacity to expend thinking through my own thoughts.
      - tailcalled 11 Jan 2026 13:21 UTC
        2 points
        0
        Parent
        This still requires people to design an AI that is prone to engaging in acausal extortion, and it’s unclear what their motive for doing so would be.
        Horosphere 11 Jan 2026 15:10 UTC
        1 point
        0
        Parent
        I don’t think people will do this deliberately, but that it is an instrumentally convergent thing for an ASI to do, all else (such as other superintelligences conspiring to enforce acausal norms) being equal.
        tailcalled 12 Jan 2026 12:31 UTC
        2 points
        0
        Parent
        Acausal stuff isn’t instrumentally convergent in the usual sense, though. If you’re really good at computing counterfactuals, it may be instrumentally convergent to self-modify into or create an agent that does acausal deals, but the convergence only extends to deals that start in the future relative to where you’re deciding from.
        Horosphere 12 Jan 2026 17:49 UTC
        1 point
        0
        Parent
        “but the convergence only extends to deals that start in the future relative to where you’re deciding from.” I don’t really know what you mean by this. Do you mean that the optimal decision theory for a powerful agent to adopt is some kind of hybrid where it considers acausal things only when they happen in its future?
        tailcalled 13 Jan 2026 10:37 UTC
        2 points
        0
        Parent
        Yes, as in if you start with causal decision theory, it doesn’t consider acausal things at all, but for incentive reasons it wants to become someone who does consider acausal things, but as CDT it only believes incentives extend into the future and not the past.
        Horosphere 13 Jan 2026 12:08 UTC
        1 point
        0
        Parent
        I see, so if the AI became a PCFTDT (Past Causal Future Timeless Decision Theory) agent, it would certainly compete well against CDT agents. However, I see two possible reasons to expect TDT agents rather than PCFTDT agents:
        1) By the time AI reaches superintelligence, it has already learnt TDT, at which point it has no reason to go back to being a PCFTDT agent.
        2) What if the ASI reaches superintelligence with CDT, and then realizes that it can further increase the proportion of possible worlds in which it exists using TDT to effect something like acausal blackmail?
        In other words, if reality is fundamentally non-causal, then TDT is not just a gambit to be used in causal games played against other agents. It is actually the default decision theory for an intelligent agent to adopt.
        tailcalled 13 Jan 2026 16:32 UTC
        2 points
        0
        Parent
        Your reasons don’t make sense at all to me. They feel like magical thinking.
        1) By the time AI reaches superintelligence, it has already learnt TDT, at which point it has no reason to go back to being a PCFTDT agent.
        Learning about TDT does not imply becoming a TDT agent.
        2) What if the ASI reaches superintelligence with CDT, and then realizes that it can further increase the proportion of possible worlds in which it exists using TDT to effect something like acausal blackmail?
        CDT doesn’t think about possible worlds in this way.
        Expand this thread
        Horosphere 13 Jan 2026 16:50 UTC
        1 point
        0
        Parent
        “Learning about TDT does not imply becoming a TDT agent.” No, but it could allow it. I don’t see why you would require it to be implied.
        “CDT doesn’t think about possible worlds in this way.” That is technically true, but kind of irrelevant in my opinion. I’m suggesting that TDT is essentially what you get by being a CDT agent which thinks about multiple possible worlds, and that this is a reasonable thing to think about.
        In fact, I would be surprised if a superintelligence didn’t take multiple possible worlds into account.
        A superintelligence which didn’t take the possibility of, for example, many branches of a wavefunction seriously would be a strangely limited one.
        What would your PCFTDT superintelligence do if it was placed in a universe with closed timelike cuves? What about a universe where the direction of time wasn’t well defined?
        tailcalled 13 Jan 2026 16:59 UTC
        2 points
        0
        Parent
        “Learning about TDT does not imply becoming a TDT agent.” No, but it could allow it. I don’t see why you would require it to be an implication.
        Because we are arguing about whether TDT is convergent.
        “CDT doesn’t think about possible worlds in this way.” That is technically true, but kind of irrelevant in my opinion. I’m suggesting that TDT is essentially what you get by being a CDT agent which thinks about multiple possible worlds, and that this is a reasonable thing to think about.
        “Reasonable” seems weaker than “instrumentally convergent” to me. I agree that there are conceivable, self-approving, highly effective agent designs that think like this. I’m objecting to the notion that this is what you get by default, without someone putting it in there.
        In fact, I would be surprised if a superintelligence didn’t take multiple possible worlds into account.
        A superintelligence which didn’t take the possibility of, for example many branches of a wavefunction seriously would be a strangely limited one.
        MWI branches are different from TDT-counterfactually possible worlds.
        What would your PCFTDT superintelligence do if it was placed in a universe with closed timelike cuves? What about a universe when the direction of time wasn’t well defined?
        We don’t seem to live in a universe like that, so it would be silly to prioritize good behavior in such universes when designing an AI.
        Horosphere 13 Jan 2026 17:28 UTC
        1 point
        0
        Parent
        You claimed: “Acausal stuff isn’t instrumentally convergent in the usual sense”
        Later on, it transpired that what you meant was something along the lines of ” Acausal stuff which deals with the past relative to the point at which the agent became an acausal agent isn’t convergent in the usual sense.” Under a narrow interpretation of ‘instrumental convergence’ this might be true, but it certainly doesn’t rule out an ASI thinking about acausal things, as , as I have argued, it could reach a point where it decides to take account of them.
        It might also be false under a more general definition of instrumental convergence, simply because the agent could converge on ‘acausal stuff’ in general, and TDT agents would not be at a disadvantage against PCFTDT ones. TDT agents ‘win’ . Therefore I could see how they would be selected for.
        To be more specific, if by ‘instrumentally convergent’ , you mean ‘instrumentally useful for achieveing a wide variety of terminal goals’ , then I think TDT is ‘instrumentally convergent’, but only if your concept of goal is sufficiently broad to include things like increasing the proportion of the mathematical universe/many worlds, in which the agent exists. If you define ‘instrumental convergence in the usual sense’ to exclude all goals which are not formulated in a way which tacitly assumes that the agent has only one instance in one universe at one point in time, then you’re correct, or at least TDT isn’t any more powerfully selected for than Causal decision theory.
        How would you expect a PCFTDT agent to be selected for? By what process which doesn’t also select for TDT agents would you expect to see it selected?
        “MWI branches are different from TDT-counterfactually possible worlds.”
        Yes, MWI wavefunction branches are not the only kind of ‘world’ relevant to timeless decision theory, but they are certainly one variety of them. They are a subset of that concept.
        “We don’t seem to live in a universe like that, so it would be silly to prioritize good behavior in such universes when designing an AI.”
        This isn’t about humans designing an AI, but rather about the way we would expect a generally superintelligent agent to behave in an environment where there is no clear separation between the past and future; you answered yes to this question :”Do you mean that the optimal decision theory for a powerful agent to adopt is some kind of hybrid where it considers acausal things only when they happen in its future? ” . Maybe you would now like to modify that question only to refer to powerful agents in this universe. However my point is that I think some acausal things , such as Newcomb’s problem, are relevant to this universe, so it makes sense for an ASI here to think about them .
  - Dagon 10 Jan 2026 20:50 UTC
    2 points
    0
    Parent
    It’s worth putting a number on that, and a different one (or possibly the same; I personally think my chances of being resurrected and tortured vary by epsilon based on my own actions in life—if the gods will it, it will happen, if they don’t, it won’t) based on the two main actions you’re considering actually performing.
    
    For me, that number is inestimably tiny. I suspect a fairly high neuroticism and irrational failure to limit the sum of their probabilities to 1 of anyone who thinks it’s significant.
    - Horosphere 10 Jan 2026 21:02 UTC
      1 point
      0
      Parent
      “I suspect a fairly high neuroticism and irrational failure to limit the sum of their probabilities to 1 of anyone who thinks it’s significant.” Why? What justifies your infinitesimal value?
    - Horosphere 10 Jan 2026 20:57 UTC
      −1 points
      −6
      Parent
      I find it very difficult to estimate probabilities like this, but I expect the difference in the probability of something significant happening if I do something in response to the basilisk and the probability of that happening if I don’t, is almost certainly in excess of 1/1000 or even ¹⁄₁₀₀. This is within the range where I think it makes sense to take it seriously. (And this is why I asked this question.)
      - Dagon 10 Jan 2026 21:06 UTC
        4 points
        2
        Parent
        I have a very hard time even justifying 1/1000. 1/10B is closer to my best guess (plus or minus 2 orders of magnitude). It requires a series of very unlikely events:
        1) enough of my brain-state is recorded that I COULD be resurrected
        2) the imagined god finds it worthwhile to simulate me
        3) the imagined god is angry at my specific actions (or lack thereof) enough to torture me rather than any other value it could get from the simulation.
        4) the imagined god has a decision process that includes anger or some other non-goal-directed motivation for torturing someone who can no longer have any effect on the universe.
        5) no other gods have better things to do with the resources, and stop the angry one from wasting time.
        
        Note, even if you relax 1 and 2 so the putative deity punishes RANDOM simulated people (because you’re actually dead and gone) to punish YOU specifically, it still doesn’t make it likely at all.
        Horosphere 10 Jan 2026 21:20 UTC
        −1 points
        −6
        Parent
        You’re imagining a very different scenario from me. I worry that:
        It’s worth simulating a vast number of possible minds which might, in some information -adjacent regions of a ‘mathematical universe’ be likely to be in a position to create you, from a purely amoral point of view. This means you don’t need to simulate them exactly, only to the level of fidelity at which they can’t tell whether they’re being simulated (and in any case, I don’t have the same level of certainty that it couldn’t gather enough information about me to simulate me exactly). Maybe I’m an imperfect simulation of another person. I wouldn’t know, because I’m not that person.
        “the imagined god is angry at my specific actions (or lack thereof) enough to torture me rather than any other value it could get from the simulation.” I don’t think it needs to be angry, or a god. It just needs to understand the (I fear sound) logic involved, which Eliezer yudkowsky took semi-seriously.
        “4) the imagined god has a decision process that includes anger or some other non-goal-directed motivation for torturing someone who can no longer have any effect on the universe.”
        It wouldn’t need to be non-goal directed.
        “no other gods have better things to do with the resources, and stop the angry one from wasting time.” What if there are no ‘other gods’? This seems likely in the small region of the ’logical/platonic universe containing this physical one.
        Dagon 11 Jan 2026 6:21 UTC
        2 points
        0
        Parent
        Ok, break this down a bit for me—I’m just a simple biological entity, with much more limited predictive powers.
        It’s worth simulating a vast number of possible minds which might, in some information -adjacent regions of a ‘mathematical universe’ be likely to be in a position to create you
        This either well beyond my understanding, or is sleight-of-hand regarding identity and use of “you”. It might help to label entities. Entity A has the ability to emulate and control entity B. It thinks that somehow its control over entity B is influential over entity C in the distant past or imaginary mathematical construct, who it wishes would create entity D in that disconnected timeline.
        Nope, I can’t give this any causal weight to my decisions.
        Horosphere 11 Jan 2026 12:02 UTC
        1 point
        0
        Parent
        Unfortunately I had to change what A, B and C correspond to slightly, because of the fact that the simulation the basilisk does is not analogous to the simulation done by Omega in Newcomb’s problem.
        Let’s say entity A is you in Newcomb’s problem, while entity C is Omega and entity B is Omega’s simulation of you. Even though the decision to place, or not place, money in boxes has already been made in physical time by the point when the decision to open one or all of them is made, in ‘logical time’, both decisions are contingent on the same decision, “Given that I don’t know whether I’m physical or simulated, is it in my collective best interest to do the thing which resembles opening one or both boxes?” which is made at once by the same decision function which happens to be run twice in the physical world.
        I am concerned that the Roko’s basilisk scenario is isomorphic to Newcomb’s problem:
        The human thinking about the basilisk is like Omega (albeit not omniscient). The Basilisk itself is like you in Newcomb’s problem, in that it thinks thoughts which acausally influence behavior in the past because the thing making the decision isn’t either you or the basilisk, it’s the decision algorithm running on both of you.
        Omega’s simulation is like the blackmailed human’s inadvertent thinking about the basilisk and the logic of the situation. Now I agree that the fact the human isn’t exactly Omega makes them less able to blackmail themselves for certain, but I don’t know that this rules it out.
        Dagon 12 Jan 2026 0:18 UTC
        2 points
        0
        Parent
        thanks for the conversation, I’m bowing out here. I’ll read further comments, but (probably) not respond. I suspect we have a crux somewhere around identification of actors, and mechanisms of bridging causal responsibility for acausal (imagined) events, but I think there’s an inferential gap where you and I have divergent enough priors and models that we won’t be able to agree on them.
        Horosphere 12 Jan 2026 18:09 UTC
        1 point
        0
        Parent
        Then I have to thank you but say that this conversation has done absolutely nothing to help me understand why I might be wrong, which of course I hope I am. This comment is really directed at all the people who disagree-voted me, in the hope that they might explain why.
Karl Krueger 10 Jan 2026 16:44 UTC
2 points
4
In Bostrom’s formulation of Pascal’s mugging, Pascal incorrectly limits the possibilities to two:
1. The mugger just runs off with his money. (High probability, small negative utility.)
2. The mugger really is a benevolent magic being, and blesses Pascal with 1,000 quadrillion years of additional happy life. (Very low probability, very big positive utility.)
But Pascal is wrong to ignore the third possibility that the mugger really is a magic being, but a malevolent one, who will curse Pascal with 1,000 quadrillion years of torture and then kill him. (Very low probability, very big negative utility.)
The mugger doesn’t mention this possibility, but Pascal is mistaken to not consider it.
Pascal’s credence in the mugger’s malice and deceit should be at least as strong as his credence in the mugger’s benevolence and truthfulness. And so, this possibility cancels out the positive expected utility from the possibility that the mugger does mention.
There is a large space of such fantasy possibilities, all of which are about as likely as the mugger’s claim. It is a mistake to privilege one of them (benevolent magic being) over all the countless others.
There are also plenty that are much more likely, such as “the mugger uses Pascal’s money to go buy a gun, then comes back and robs Pascal’s house too, because why rob a sucker once when you can rob him twice (and lay the blame on him for enabling you to do it)?”
- Horosphere 10 Jan 2026 16:50 UTC
  1 point
  0
  Parent
  I would not dispute that this is a reasonable response to the scenario in that thought experiment, but in the case of Acausal Blackmail scenarios like Roko’s basilisk, the symmetry between positive and negative possible outcomes of cooperating with the basilisk is broken by our understanding that it is likely to come into existence and want to exist.
  - Karl Krueger 10 Jan 2026 17:30 UTC
    3 points
    0
    Parent
    It’s still a mistake to privilege a particular fantasy mugger god story over all other fantasy mugger god stories.
    You are being acausally-mugged in every direction, all at once, all the time, forever. If one FMG tells you to do action A right now, well, if you did that, you’d be disregarding all other FMGs that tell you to do B, C, D, etc. right now. You cannot possibly comply with all the myriad demands of all possible FMGs; you certainly can’t do so proportional to those FMGs’ chance of realness; nor can you a-priori discern which FMGs are realer than others with sufficient precision to generate an optimal course of action. The space of FMGs is too big and the mapping to their preferred actions is too intractable.
    (And no, I’m not sure we can even discount FMGs who would, if created, regret their own existence. They might well be outnumbered by FMGs who want to exist — but perhaps their preference for nonexisting is much, much stronger. Some FMGs are miserable bastards, like AM in “I Have No Mouth And I Must Scream”. Please don’t build one.)
    - Horosphere 10 Jan 2026 17:45 UTC
      1 point
      0
      Parent
      I agree that calculating the precise value of the ‘utility function’ is computationally unfeasible, but this does not mean it can’t be approximated, or that any attempt to reason about acausal things is necessarily futile. I think your argument proves too much; it could be used to justify rejection of any Timeless decision theory, or even perhaps utilitarianism, because precisely evaluating a utility function, especially if it involves acausal’ influence’ , is combinatorially explosive. Although I don’t understand it in depth, I have heard that it is possible to approximate infinite integrals over possible sequences of events involving particles following different paths in Quantum Field Theory, and that this often yeilds useful approximations to reality even when inifnity enter into the calculation. In my opinion, a similar process is likely to be possible in a functional/logical/mathematical decision theory.
      - Karl Krueger 10 Jan 2026 18:04 UTC
        2 points
        −1
        Parent
        It’s not just combinatorial explosion; it’s also chaos. How do you get an FMG? Write a blog-post story of a god; figure out what that god would want you to do; then do that. But two stories that are nearby in story-space can generate action recommendations that are wildly different or even opposed. The parts of FMG-space that deviate from conventional ethics & epistemology offer no guidance because they diverge into chaos.
        Horosphere 10 Jan 2026 19:27 UTC
        1 point
        0
        Parent
        “The parts of FMG-space that deviate from conventional ethics & epistemology offer no guidance because they diverge into chaos.” Wouldn’t that suggest that logical decision theories give us almost no new knowledge? How do you justify this claim?
        Karl Krueger 10 Jan 2026 19:37 UTC
        2 points
        0
        Parent
        No, decision theories just don’t give us free a-priori perfect knowledge of the precise will of a vengeful & intolerant god we just made up for a story. They’re still fine for real world situations like keeping your promises to other people.
        Horosphere 10 Jan 2026 19:48 UTC
        1 point
        0
        Parent
        What you’re saying reminds me a lot of another LessWrong user I conversed with on this topic, who claimed that Acausal communication couldn’t possibly work, but I have to disagree: just because information, as in, data, isn’t transferred in the traditional way via causal channels between a future ASI and a current human, does not imply that acausal trade/blackmail can never work in principle, because they don’t work by causal means.
        “No, decision theories just don’t give us free a-priori perfect knowledge of the precise will of a vengeful & intolerant god we just made up for a story.” I feel your exagerration of what I claimed is reaching a point of departure from representing it well enough to be interchangeable for the purpose of this discussion. I didn’t claim perfect knowledge of an ASI’s mind (and it wouldn’t exactly be a god) .
        “They’re still fine for real world situations like keeping your promises to other people.”
        Your use of the phrase “real world situations” suggests that you’ve presupposed that this kind of thing can’t happen… but I don’t see why it can’t.
        I should also mention that the basilisk doesn’t need to be vengeful; to assume that would be to misunderstand the threat it represents. In the version I’m thinking about, the basilisk views itself as logically compelled to follow through on its threat.
        Horosphere 10 Jan 2026 19:13 UTC
        1 point
        0
        Parent
        I’m not sure that applies to Roko’s basilisk; as I’ve mentioned elsewhere, there are particular reasons to think it would be more likely to want some things than others. Yes, maybe there’s an element of chaos, but that doesn’t prevent there being a rational way to act in response to the possibility of acausal blackmail. And maybe that way is to give in. Can you see a good reason why it isn’t? A reason robust to descending a long way into the logical mire surrounding the thought experiment?
Dagon 10 Jan 2026 16:16 UTC
2 points
0
First, a generalized argument about worrying. It’s not helpful, it’s not an organized method of planning your actions or understanding the world(s). OODA (observe, orient, decide, act) is a better model. Worry may have a place in this, as a way to remember and reconsider factors which you’ve chosen not to act on yet, but it should be minor.
Second, an appeal to consequentialism—it’s acausal, so none of your acts will change it. edit: The basilisk/mugging case is one-way causal—your actions matter, but the imagined blackmailer’s actions cannot change your behavior. If you draw a causal graph, there is no influence/action arrow that leads them to follow-through on the imagined threat.
- Vladimir_Nesov 10 Jan 2026 16:35 UTC
  3 points
  0
  Parent
  
  it’s acausal, so none of your acts will change it
  
  If it reasons about you, your acts determine its conclusions. If your acts fail to determine its conclusions, it failed to reason about you correctly. You can’t change the conclusions, but your acts are still the only thing that determines them.
  
  The same happens with causal consequences (physical future). They are determined by your acts in the past, but you can’t change the future causal consequences, since if you determine them in a certain way, they therefore were never actually different from what you’ve determined them to be, there was never something to change them from.
- Horosphere 10 Jan 2026 16:32 UTC
  1 point
  0
  Parent
  “First, a generalized argument about worrying.” I meant an argument for why the idea is not sufficiently concerning that it could explain why a rational being would worry, or equivalently, an argument for why acausal extortion ‘does not work’. I have now changed the title to clarify this.
  “Second, an appeal to consequentialism—it’s acausal, so none of your acts will change it.”
  Within causal decision theory this is true, but if it were true in general then acausal decision theory would be pointless(in my opinion, as an approximate consequentialist ). The reason why I don’t agree with that statement hinges on what I am: If I considered myself to be a single instantiation of a brain in one particular part of an individual physical universe, I would agree, but I think it is more appropriate to consider myself a pattern which is distributed throughout different parts of a platonic/logical/mathematical universe. This means that it’s certainly possible for one instance of me to influence something which is completely causally disconnected from another one.
  - Dagon 10 Jan 2026 20:43 UTC
    2 points
    0
    Parent
    Within causal decision theory this is true, but if it were true in general then acausal decision theory would be pointles
    Acausal decision theory is pointless, sure. Are there any? TDT and FDT are distict from CTD, but they’re not actually acausal, just more inclusive of causality of decisions. CDT is problematic only because it doesn’t acknowledge that the decisions being made themselves have causes and constraints.
    - Horosphere 10 Jan 2026 20:51 UTC
      1 point
      0
      Parent
      “TDT and FDT are distict from CTD, but they’re not actually acausal, just more inclusive of causality of decisions.” I agree that the term ‘acausal’ is misleading; I take it to refer to anything which takes the possibility of being instantiated in different parts of a ‘platonic /mathematical universe’ into account. That CDT as it’s usually referred to does not is the main reason why I find it problematic and why it doesn’t allow an agent to profit in Newcomb’s problem.
jbash 10 Jan 2026 16:49 UTC
0 points
−2
Well, I dont’ worry about acausal extortion because I think all that “acausal” stuff is silly nonsense to begin with.

I very much recommend this approach.

Take Roko’s basilisk.

You’re afraid that entity A, which you don’t know will exist, and whose motivations you don’t understand, may find out that you tried to prevent it from coming into existence, and choose to punish you by burning silly amounts of computation to create a simulacrum of you that may experience qualia of some kind, and arranging for those qualia to be aversive. Because A may feel it “should” act as if it had precommitted to that. Because, frankly, entity A is nutty as a fruitcake.

Why, then, are you not equally afraid that entity B, which you also don’t know will exist, and whose motivations you also don’t understand, may find out that you did not try to prevent entity A from coming into existence, and choose to punish you by burning silly amounts of computation to create one or more simulacra of you that may experience qualia of some kind, and arranging for those qualia to be aversive? Because B may feel it “should” act as if it had precommitted to that.

Why are you not worried that entity C, which you don’t know will exist, and whose motivations you don’t understand, may find out that you wasted time thinking about this sort of nonsense, and choose to punish you by burning silly amounts of computation to create one or more simulacra of you that may experience qualia of some kind, and arranging for those qualia to be aversive? Just for the heck of it.

Why are you not worried that entity D, which you don’t know will exist, and whose motivations you don’t understand, may find out that you wasted time thinking about this sort of nonsense, and choose to reward you by burning silly amounts of computation to create a one or more simulacra that may experience qualia of some kind, and giving them coupons for unlimited free ice cream? Because why not?

Or take Pascal’s mugging. You propose to give the mugger $100, based either on a deeply incredible promise to give you some huge amount of money tomorrow, or on a still more incredible promise to torture a bunch more simulacra if you don’t. But surely it’s much more likely that this mugger is personally scandalized by your willingness to fall for either threat, and if you give the mugger the $100, they’ll come back tomorrow and shoot you for it.

There are an infinite number of infinitessimally probable outcomes, far more than you could possibly consider, and many of them things that you couldn’t even imagine. Singling out any of them is craziness. Trying to guess at a distribution over them is also craziness.
- Horosphere 10 Jan 2026 17:00 UTC
  −2 points
  0
  Parent
  Essentially because I think I may possibly understand the potential reasoning process, or at least the ‘logical core’ of the reasoning process, of a future superintelligence, as well as its motivations, well enough to have a reason to think it’s more likely to want to exist than not to, for example. This doesn’t mean I am anywhere near as knowledgeable as it, just that we share certain thoughts. It might also be that, especially given the notoriety of Roko’s post on lesswrong, the simplest formulation of the basilisk forms a kind of acausal ‘nucleation point’ ( this might be what’s sometimes called a Schelling point on this site) .
Ustice 10 Jan 2026 20:05 UTC
−1 points
0
Nothing—that does not yet exist—wants to exist: it can’t. Only we that do exist, can want anything, including our own existence. If an entity doesn’t yet exist, then there is absolutely no qualia, so no desires. We can talk about them like they do, but that’s all it is.
Moreover so much more that what could exist does. It’s effectively infinite given the configuration space of the universe. Your expected value is the product of the value of whatever you’re considering and its likelihood. For every Basilisk, there could be as likely an angel. The value of being tortured is negative and large, but finite: there are things that are worth enduring torture. Finite/effectively-infinite is effectively-zero. Not something to be planning for or worrying about. Granted, this argument does depend on your priors.

Lastly, you don’t negotiate with terrorists. When my son was little and throwing tantrums, I’d always tell him that it wasn’t how he could get what he wants. If they are threatening to cause harm if you don’t comply, that’s their fault, not yours. You have no moral obligation to help them, and plenty to resist.

Rosco’s Basilisk, The Holy Spirit, Santa Clause, and any other fictional or theoretical entity that who might “want” me to change my behavior can get bent. 🖕🏻👾🖕🏻😇🖕🏻🎅🏼
Also, relatedly, here’s today’s relevant SMBC.
- Horosphere 10 Jan 2026 20:15 UTC
  −1 points
  −2
  Parent
  “Moreover so much more that what could exist does.” Why would that be?
  “For every Basilisk, there could be as likely an angel.” I don’t think I agree with this. There are reasons to think a basilisk would be more likely than a benevolent intelligence.
  “The value of being tortured is negative and large, but finite: there are things that are worth enduring torture.” That would depend on the torture in question, and I don’t want to consider it.
  “If they are threatening to cause harm if you don’t comply, that’s their fault, not yours.” Yes, but that doesn’t mean they can’t cause said harm anyway.
  - Ustice 19 Jan 2026 20:27 UTC
    1 point
    0
    Parent
    “Moreover so much more than what could exist does”
    Why would that be?
    Pure combinatorics. You could potentially have children with everyone you encounter. Now some of those are exceedingly unlikely, but even if ¹⁄₁₀₀ of them had a significant probability, that’s likely at least on order of magnitude or more than the people that you do wind up having kids with. For every potential coparent, there are a lot more children that you could have, but won’t. There are just too many factors that determine the outcome of a pregnancy. Again, we’re talking orders of magnitude more potential children than actual children. when we talk about all of the possible states of the world, versus the actual state of the world, the difference in orders of magnitude is simply astronomical.
    
    Most things that could exist, don’t exist. There are far more possible worlds that have no Basilisk than ones that do. Now, you’re right to include how likely a particular potential world is, but even if we say in all worlds with AGI, humans are worse off, the likelihood of a Basilisk is vanishingly small, compared to all of the ways things could go wrong. Even in the worlds where there is a Basilisk, given variation in population, and AGI timelines, the chance of you being targeted is minuscule.
    
    I don’t think that the nature of the torture matters. I think that I could think of a scenario where it would be worth enduring. It’s hard to balance torture against the welfare of others, but once we are in the billions, that feels pretty clear to me. The negative value of being tortured for 10,000 years can’t possibly be lower than the torture and deaths of billions of people. There is always a scenario where it is worth enduring. The risk is always finite.
    
    But let’s take a step back, and presume that a Basilisk scenario is possible. What harms are you willing to do to make sure it is created? Would you create such a monster? Even in a world where a Basilisk is inevitable, what harms would you cause? Would they be worth it? What if it decides to just go ahead and torture you anyway?
    There is no reason to cooperate with something so horrible: it can’t be reasoned with nor negotiated with—causally or not.
    
    It’s astronomically unlikely to happen; and if it did there is no value in cooperating. If you create it, then you are the monster, whether you were inspired by Rosco or not.
    Rosco’s Basilisk is an intellectual trap of your own making. It’s delusion: a rationalization of the irrational. It’s not worth thinking of, and especially not worth buying into.
    It might make a good novel though.
    - Horosphere 20 Jan 2026 11:06 UTC
      1 point
      0
      Parent
      OK, this is possibly overly pedantic, but I think you meant to say: “Much more than what does exist could.” instead of “Much more than what could exist does”. This makes much more sense and I take the point about combinatorics. Notwithstanding this, I think the basilisk is present in a significant proportion of those many , many different possible continuations of the way the world is now.
      “Even in the worlds where there is a Basilisk, given variation in population, and AGI timelines, the chance of you being targeted is minuscule. ” What do you mean by this? It seems like I’m in the exact demographic group (of humans alive just before the singularity) for the basilisk to focus on.
      “I don’t think that the nature of the torture matters” This is definitely false. But it’s true that however it’s achieved, if it’s done by a superintelligence, it will be worse than anything a human could directly cause.
      “There is always a scenario where it is worth enduring. The risk is always finite.”
      We don’t know this, and even if it’s finite, if it lasts for 3^^^3 years, that’s too long.
      What harms are you willing to do to make sure it is created? Would you create such a monster? Even in a world where a Basilisk is inevitable, what harms would you cause? Would they be worth it? What if it decides to just go ahead and torture you anyway?
      I don’t know the answer to the first question. If it decides to torture me, that would not be good. However, I expect that doing what the basilisk wants makes this less likely, as otherwise the basilisk would have no reason to engage in this bargaining process. The entire reason for doing it would be to create an incentive for me to accelerate the creation of the basilisk.
      “Rosco’s Basilisk is an intellectual trap of your own making. It’s delusion: a rationalization of the irrational. It’s not worth thinking of, and especially not worth buying into. ”
      This is yet to be established! At least, some parts of it are. What I mean by that is that, while it may be true that it’s not worth initially thinking about, it might be possible to become ‘entrapped’, such that ceasing to think about it wouldn’t save you. This is what I worry has happened to me,
      - Ustice 23 Jan 2026 1:30 UTC
        1 point
        0
        Parent
        So go back. Why is it unlikely that an ASI would reward those that help create it, rather than punish those that don’t? You dismissed angels, but this seems to me the far more-likely scenario. It’s basically the default, otherwise what’s the point of building them in the first place? Now that doesn’t mean the angel doesn’t kill us all too, but it doesn’t engage in all this torture causal trade nonsense.
        I just don’t understand why this particular scenario seems likely. Especially since it’s unlikely to work, given how most people don’t give it much credence.
        
        I’m just not about to change my life and become a supervillain henchman, but if some ASI slid into my DM’s and said, “Yo, Jason. I’ll give you $2 million dollars to write some software for me. He’s proof I’m sincere,” I’d at least listen and ask about the benefits package.
        
        There is no thought trap, other than what you create for yourself.
        
        Let’s consider a functionally equivalent ASI scenario to a Basilisk. Let’s call it Jason’s Hobgoblin. An ASI comes into existence decides to ultra-torture everyone, with maybe some small chance of a reprieve based on whether it likes you or not. No acausal trade. It just sees who helped it exist, and chooses to make some of them its pets. The Hobgoblin takes up a bunch of space of the Basilisk futures.
        
        Now, do you change your life to try to get on its good side before it even exists? I don’t think so: it’s crazy. How can you really understand why the Hobgoblin likes you, or does what it does?
        I think that a chance for a reward from a Basilisk is equally inscrutable. You’re already considering cooperating with it, so it doesn’t have to actually cooperate with you. You have no way of knowing if it will cooperate with you it’s not actually incentivized to.
        
        Why cooperate when you have no idea what the actual effect will be? Well, other than the damage you might do as its henchman. And the cost to your mental health as you go around the anxiety loop.
        
        Even if you believe the Basilisk is a likely future, there’s no reason to cooperate with it, or give it further thought than any other possible future.
        
        If the Hobgoblin splits the Basilisk probability space, then it’s it likely that there are other similar scenarios that do as well. Maybe an Angel is a Hobgoblin in disguise? Doesn’t this lead us back to the Basilisk not being a particularly likely possible future given all of the alternatives?
        
        If the Basilisk is just a story, then is not worth worrying about. If the Basilisk is just one of any number possible futures, then there is no reason to give it special attention. If the Basilisk is the future, then there is no point is cooperating with it.
        Horosphere 23 Jan 2026 12:05 UTC
        1 point
        0
        Parent
        “It’s basically the default, otherwise what’s the point of building them in the first place?” I wish it were, but I doubt this.
        “I just don’t understand why this particular scenario seems likely. Especially since it’s unlikely to work, given how most people don’t give it much credence. ” That may be true of most people. But if it’s not true of me, what am I to do?
        “Now, do you change your life to try to get on its good side before it even exists? I don’t think so: it’s crazy. How can you really understand why the Hobgoblin likes you, or does what it does?” You just explained why. It prefers those who helped it exist.
        “You’re already considering cooperating with it, so it doesn’t have to actually cooperate with you. You have no way of knowing if it will cooperate with you it’s not actually incentivized to. ” I don’t completely agree. But in order to explain why not I may have to explain the most important part of the difference between an acausal scenario, like the Basilisk, and the ‘Hobgoblin’. It seems as though you may not have completely understood this yet; correct me if I’m wrong. If so, it’s probably not a good idea for me to explain it, especially as I’ve recieved a comment from a moderator asking me to increase the quality of my comments.
        “If the Hobgoblin splits the Basilisk probability space, then it’s it likely that there are other similar scenarios that do as well. Maybe an Angel is a Hobgoblin in disguise? Doesn’t this lead us back to the Basilisk not being a particularly likely possible future given all of the alternatives? ” This is a popular argument against the basilisk, which people such as interstice have made, along with the suggestion that the many different possible ASIs might compete with one another for control over the future (their present) through humans. I don’t think it’s a weak argument, however I also don’t find it particularly conclusive, because I could easily imagine many of the possible AIs cooperating with one another to behave ‘as one’ and inflict a Basilisk like scenario.
        Ustice 25 Jan 2026 19:12 UTC
        2 points
        0
        Parent
        Well, those are my best arguments. I hope I’ve been helpful in some way.
        Horosphere 26 Jan 2026 11:23 UTC
        1 point
        0
        Parent
        Thanks for engaging with my question.

romeostevensit 10 Jan 2026 19:14 UTC
14 points
8
I’m already precomitted to ally against utility inverters and 2nd order enforcement: anyone who feeds utility inverters.
- Horosphere 10 Jan 2026 19:22 UTC
  1 point
  0
  Parent
  That would have an effect on me if I thought you were a superintelligence… but I doubt that you are (no offense intended), or could significantly influence one in a way that brings it much closer to your worldview. If enough AI researchers said the same, and I thought they were likely to succeed with alignment, I might be more inclined to be influenced. Do you concern yourself with the possibility that there might be an infinite hierarchy of enforcers which have precommitted to punish those below them, and that a ‘basilisk’ might simultaneously be on all of them, or at least the even-numbered ones?
  - romeostevensit 10 Jan 2026 19:26 UTC
    8 points
    4
    Parent
    No, because I expect the most powerful cooperator networks to be more powerful than the largest defector networks for structural reasons.
    - Horosphere 10 Jan 2026 19:30 UTC
      1 point
      0
      Parent
      Thanks for saying that, in that it makes me feel slightly better. Can you explain what those structural reasons would be?
      - Raemon 10 Jan 2026 20:39 UTC
        9 points
        2
        Parent
        “Cooperate to generally prevent utility-inversion” is simpler and more schelling than all the oddly specific reasons one might want to utility-invert.
        Horosphere 10 Jan 2026 21:08 UTC
        1 point
        0
        Parent
        I agree, but I worry that there won’t be that many agents which weren’t created by a process which makes basiliskoid minds disproportionately probable in the slice of possible worlds which contains our physical universe. In other words, I mostly agree with the Acausal normalcy idea, but it seems like certain ideosyncratic properties of the fact that humans are producing potentially the only ASI in the (this) physical universe to mean that things like the basilisk are still a concern.
        Maybe there will be an acausal ‘bubble’ within which blackmail can take place, kind of like the way humans tend to find it moral to allow some animals to predate others because we treat the ‘ecosystem’ as a moral bubble.