Raemon comments on Notes on fatalities from AI takeover

Raemon 4 Oct 2025 23:13 UTC
7 points
0
So I think this topic isn’t exactly cruxy for most decisions (except things routing through “is the IABIED thesis in particular wrong?”. But, wanted to leave a quick note about where I ended up after thinking about it awhile longer:
tl;dr
- “for trade” doesn’t seem like it should be >50% likely to avoid something-like-death (and IMO, the only reason I’m not at <1% is model uncertainty)
  - (for Pascal Wager Counterargument reasons)
- “because they’re actually nice” doesn’t seem like it should be >25%-ish likely (and I’d personally go much lower than 25%)
  - (for “uncertainty about how their values will structurally work” reasons)
(for time reasons I didn’t doublecheck exactly what numbers you assigned to things and was going off memory, I may have misremembered and maybe you’re not saying things that different from what I say here but it seemed like you were. If your numbers were similar to what I wrote above maybe this isn’t a disagreement, just an elaboration)
Game Theory
Agreed there’s some game theoretic reason to preserve humanity just-in-case, but, I don’t see why it’d route through “forced uploading or stasis.”
There might be entities that might pay a premium for humanity that was fully preserved, and got access to a full star system.
But there might also be entities that care about humanity, but, actively prefer a pristine stasis copy, rather than a “stunted” copy that evolved on it’s own for millions of years in suboptimal conditions.^[1]” Or, lots of entities that prefer stasis copies because they are just intellectually interested in us rather than caring about us. Pascal’s Wager has to account for multiple gods wanting multiple things.
So it seems wrong to put a >50% (or really even 10%) on “AI preserves humanity somehow for Trade Reasons” resulting in “we straightforwardly get to survive.”
Just Being Nice because maybe it’s very slightly aligned or niceness is very slightly schelling.
The general thrust of:
- look, it really needs to be such a teeny amount nice for it to end up nice
- it would be pretty weird if somehow it had nonzero propensity to be nice and the amount was ‘less than Leave Earth Alone’
- it would be pretty weird if it had zero niceness
seems at least plausibly pretty compelling. The main thing that felt sus about your other comments here was defining ‘pretty nice’ in terms of ‘they will spend a fraction of their resources on us, no matter what, even if there are lots of other things they care about more’, and then giving that a >50% likelihood of being how niceness would play out. (Tho I might have misread you there)
That’s one way things could shake out. Two other ways are:
1. They care about us nonzero, but there are nigh-infinite things they also care about about-as-much-as-us-or-more, enough that the many galaxies of resources they lose by slowing down aren’t clearly just more important
2. They care about us, but not in a ‘fraction-of-resources’ sense, but rather in a ’they’d save us all-else-equal, but, things are not equal, and even ignoring the many galaxies out there there’s just something they’d rather do with one spare solar system more than us.
(I’d maybe include #3 “random other reasons we haven’t thought about yet”)
#1 and #2 are similar, but I intend to make fairly different points with them.
The structure of #1 is taking at face value them attempting to give us some small fraction of their resources, just, even when they do that, the teeny-percentage-cost of us is outweighed by there being vast numbers of things they also want to give a similar percentage to, and the number of galaxies they lose by slowing down a few weeks or months just dwarfs the amount they are able to devote.
The structure of #2 is “they just aren’t even trying to give us a fraction of their wealthy, they’re just weighing each block of resources and asking ‘what’s the best thing to do with this?’ and we show up under consideration but below the threshold.
It seems very weird to be confident in your frame #0 of “they earnestly care about us nonzero, which they frame as a fraction of their wealth, and there are no competing things they care about” as noticeably more likely than the other two, without more argumenation than you’ve made thus far.
It seems like the structure of the “but it’s so cheap” argument is like “the EV of them caring a little is pretty high, because it’s so cheap” but that is different than “the chances of them structurally caring this way is >50%”
1. ^
  (I guess at least a partial counterpoint is “of the aliens that care about our values/freedom, rather than us as intellectually curiosities, maybe they care a lot about us all getting preserved, and maybe it’s about as hard to put us all in stasis as to just leave us alone.” Seems seems like more a stretch and shouldn’t be the mainline guess)
- ryan_greenblatt 6 Oct 2025 22:01 UTC
  2 points
  0
  Parent
  I currently think “the AIs instrically care enough to spend >1 / 10 billion of resources keeping humans alive” is like (idk) 35% likely and that “acausal/causal trade would incentivize AIs to spend >1 / 10 billion of resources on keeping humans alive if the AIs care about acausal trade and there isn’t some other entity paying even more for some other outcome” is like 75% likely.
  
  (I think this is similar to what my view was when I wrote this post, but maybe a bit less optimistic after further reflection. I now think the chance that >50% of humans die (or have something happen to them that is similarly bad as death) due to rapid industrial expansion is more likely, maybe 35% and the chance of something which is effectively like extinction is maybe 25% (though the details of what counts as extinction might matter a bunch and uncertainty about this is driving a bunch of my change in views).)
  
  Agreed there’s some game theoretic reason to preserve humanity just-in-case [...] Pascal’s Wager has to account for multiple gods wanting multiple things.
  
  Isn’t it kinda surprising if the highest bidder wants to do something to humans which is as bad or close to as bad as killing them? (As bad from the perspective of typical humans.)
  
  Part of this is that I don’t see this as “just in case”, I’d say that it seems likely that someone is willing to compensate AIs for keeping humans alive and it’s pretty plausible that the AIs have the smarts/compute to do actual acaual trade prior to them otherwise killing humans (due to rapid industrial expansion at least). This is messier if the best takeover strategies involve killing humans. E.g., in the AI 2027 race scenario, I think the AIs probably would have been able to do acausal trade reasonably prior to killing off the humans.
  - Raemon 8 Oct 2025 0:11 UTC
    2 points
    0
    Parent
    It occurs to me:
    You are more optimistic than I that our current AIs will care enough to spend 1/billionth their resources on keeping us alive.
    You are separately more optimistic than I, that one could expect the high bidders for trading “we saved the humans” to care about not merely out well being but our agency.
    It seems like those maybe share a crux at how natural niceness is (which is… not exactly doublecounting, but, if you were to change your mind about that, probably both of those numbers drop. Is that right?)
    - ryan_greenblatt 8 Oct 2025 0:30 UTC
      2 points
      0
      Parent
      Yes, there is an underlying correlation. E.g., if I thought that humans on reflection wouldn’t care at all about bailing out other humans and satisfying their preferences to remain physically alive this would be evidence on both trade and about AIs.
  - Raemon 6 Oct 2025 23:02 UTC
    2 points
    1
    Parent
    I currently think “the AIs instrically care enough to spend >1 / 10 billion of resources keeping humans alive” is like (idk) 35%
    Seems high (factoring in the unknown unknowns of other things to care about), but, not crazy.
  - Raemon 6 Oct 2025 23:02 UTC
    2 points
    0
    Parent
    Isn’t it kinda surprising if the highest bidder wants to do something to humans which is as bad or close to as bad as killing them? (As bad from the perspective of typical humans.)
    I don’t even think it’s obvious most humans would/should prefer the non-upload route once they actually understood the situation (like it seems super reasonable to consider that “not death”), and is just a pretty reasonable thing for an AI to say “okay, I do think I just know better than you what you will want after you think about it for ~~5 minutes~~ like a month)
    I also think plenty of high-bidders would have some motivation to help humans, but their goal isn’t obviously “give them exactly what they want” as opposed to “give them a pretty nice zoo that is also optimized for some other stuff.”
    highest bidder
    At the time the AI is making this call, it hasn’t yet built Jupiter brains and will have uncertainty about the bid spread and who’s out there and whether aliens or acausal trade are even real and which ones are easier to contact.
    The upload version gives you a lot more option value – it’s tradeable to the widest variety of beings, and at the very least you can always reconstruct the solar system, so the only people you’re losing bargaining power with are the few aliens who strongly prefer “unmodified solar system continued” vs “reconstructing original unmodified solar system after the fact”, which seems like a very weirdly specific thing to care that strongly about.
    (Also, you might get higher bids if you’re actually able to get multiple gods bidding for it. If you only did the non-stasis’d solar system version, you only get to trade with the Very Specific Altruists, and even if they are the highest bidders, you lose the ability to get the Weird Curious Zookeepers bidding the price up)
    - ryan_greenblatt 7 Oct 2025 0:57 UTC
      2 points
      0
      Parent
      Hmm, it seems that from your perspective “do non-consensual uploads (which humans would probably later be fine with) count as death” is actually a crux for fatality questions. I feel like this is a surprising place to end up because I think keeping humans physically alive isn’t much more expensive and I expect a bunch of the effort to keep humans alive to be motivated by fulfilling their preferences (in a non-bastardized form) rather than by something else.
      
      Intuitively, I feel tempted to call it not death if people would be fine with it on reflection but it seems like a mess and either way not that important.
      
      the only people you’re losing bargaining power with are the few aliens who strongly prefer “unmodified solar system continued” vs “reconstructing original unmodified solar system after the fact”
      
      What about people who want you to not do things to the humans that they consider as bad as death (at least without further reflection).
      - Raemon 7 Oct 2025 1:20 UTC
        6 points
        0
        Parent
        Intuitively, I feel tempted to call it not death if people would be fine with it on reflection but it seems like a mess and either way not that important.
        Nod, I think this is both fine, and, also, resolving the other which way would be fine.
        “do non-consensual uploads (which humans would probably later be fine with) count as death” is actually a crux for fatality questions.
        On my end the crux is more like “the space of things aliens could care about is so vast, it just seems so unlikely for it to line up exactly with the preferences of currently living humans.” (I agree “respect boundaries” is a schelling value that probably has disproportionate weight, but there’s still a lot of degree of freedom of how to implement that, and how to trade for it, and whether acausal economies have a lot of Very Oddly Specific Trades (i.e. saving a very specific group) going on that would cover it.
        The question of whether “nonconsensual uploads that you maybe endorse later” is a question I end up focused on mostly because you’re rejecting the previous paragraph,
        What about people who want you to not do things to the humans that they consider as bad as death (at least without further reflection).
        I agree that’s a thing, just, there’s lots of other things aliens could want.
        (Not sure if cruxy, but, I think the aliens will care about respecting our agency more like the way we care about respecting trees agency, than the way we care about respecting dogs agency)
        Raemon 7 Oct 2025 1:27 UTC
        2 points
        0
        Parent
        Or: “we will be more like trees than like dogs to them.” Seems quite plausible they might be more wisely benevolent towards us than humans are towards trees currently.
        But, it seems like an important intuition pump for how they’d be engaging with us and what sort of moral reflection they’d have to be doing.
        i.e. on the “bacteria → trees → cats → humans → weakly superhuman LLM → … ??? … → Jupiter Brain that does acausal trades” spectrum of coherent agency and intelligence, it’s not obvious we’re more like Jupiter Brains or like trees.
        (somewhere there’s a nice Alex Flint post about how you would try to help a tree if you were vaguely aligned to it)