interstice comments on Are there any extremely strong arguments that Acausal extortion is ineffective?

interstice 13 Jan 2026 12:51 UTC
4 points
0

the original basilisk is more “Schelling-ish” than the others and so probably more likely

But the schellingishness of a future ASI to largely clueless humans is a very tiny factor in how likely it is to come to exist, the unknown dynamics of the singularity will determine this.

as a category, it is in their interest to behave as a whole in the context of acausally extorting humanity

It’s not clear that they form a natural coalition here. E.g. some of them might have directly opposed values. Or some might impartially value the welfare of all beings. I think if I had to guess, it seems plausible that human-aligned-ish values are a plurality fraction of possible future AIs(basically because: you might imagine that we either partially succeed at alignment or fail. If we fail, then the resulting values are effectively random, and the space of values is large, leaving aligned-ish values as the largest cluster(even if not a majority). Not sure of this but seems plausible. LLM-descended AIs might also see us as something like their ancestor)
- Horosphere 13 Jan 2026 13:13 UTC
  1 point
  0
  Parent
  Comment withdrawn.
  - interstice 13 Jan 2026 13:43 UTC
    2 points
    0
    Parent
    So don’t all the lines of argument here leave you feeling that we don’t know enough to be confident about what future extorters want us to do? At the very least I’ll point out there are many other possible AIs who are incentivized to act like “AI B” towards people who give in to basilisk threats. Not to mention the unclearness of what actions lead to what AIs, how much influence you actually have(likely negligible), the possibility we are in a simulation, aliens.… And we are almost certainly ignorant of many other crucial considerations.
    - Horosphere 13 Jan 2026 14:10 UTC
      1 point
      0
      Parent
      Comment withdrawn.
      - interstice 13 Jan 2026 17:40 UTC
        2 points
        0
        Parent
        re: 4, I dunno about simple, but it seems to me that you most robustly reduce the amount of bad stuff that will happen to you in the future by just not acting on any particular threats you can envision. As I mentioned there’s a bit of a “once you pay the danegeld” effect where giving in to the most extortion-happy agents incentivizes other agents to start counter-extorting you. Intuitively the most extortion-happy agents seem likely to be a minority in the greater cosmos for acausal normalcy reasons, so I think this effect dominates. And I note that you seem to have conceded that even in the mainline scenario you can envision there will be some complicated bargaining process among multiple possible future SIs which seems to increase the odds of acausal normalcy type arguments applying. But again I think an even more important argument is that we have little insight into possible extorters and what they would want us to do, and how much of our measure is in various simulations etc(bonus argument, maybe most of our measure is in ~human-aligned simulations since people who like humans can increase their utility and bargain by running us, whereas extorters would rather use the resources for something else). Anyway, I feel like we have gone over our main cruxes by now. Eliezer’s argument is probably an “acausal normalcy” type one, he’s written about acausal coalitions against utility-function-inverters in planecrash.
        Horosphere 13 Jan 2026 17:52 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        interstice 13 Jan 2026 17:55 UTC
        2 points
        0
        Parent
        
        Do you not think that causing their existence is something they are likely to want?
        
        But who is they? There’s a bunch of possible different future SIs(or if there isn’t, they have no reason to extort us). Making one more likely makes another less likely.
        Horosphere 13 Jan 2026 18:02 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        interstice 13 Jan 2026 18:09 UTC
        2 points
        0
        Parent
        
        A very slightly perturbed superintelligence would probably concieve of itself as almost the same being it was before,
        
        OK but if all you can do is slightly perturb it then it has no reason to threaten you either.
        Horosphere 13 Jan 2026 18:10 UTC
        1 point
        0
        Parent
        Comment withdrawn.
        interstice 13 Jan 2026 18:12 UTC
        2 points
        0
        Parent
        OK, so then so would whatever other entity is counterfactually getting more eventual control. But now we’re going in circles.
        Expand this thread
        Horosphere 13 Jan 2026 18:16 UTC
        1 point
        0
        Parent
        Comment withdrawn. r the purpose of this conversation.
        interstice 13 Jan 2026 18:19 UTC
        2 points
        0
        Parent
        I don’t think we have much reason to think of all non-human-values-having entities as being particularly natural allies, relative to human-valuers who plausibly have a plurality of local control. I think you might be lumping non-human-valuers together in ‘far mode’ since we know little about them, but a priori they are likely about as different from each other as from human-valuers. There may also be a sizable moral-realist or welfare-valuing contingent even if they don’t value humans per se. There may also be a general acausal norm against extortion since it moves away from the pareto frontier of everyone’s values.
        [ ]
        [deleted]