ryan_greenblatt comments on Notes on fatalities from AI takeover

ryan_greenblatt 24 Sep 2025 4:18 UTC
10 points
−1

I think there’s a fallacy in going from “slight caring” to “slight percentage of resources allocated.”

Suppose that preserving earth now costs one galaxy that could be colonized later. Even if that one galaxy is merely one billionth of the total reachable number, it’s still an entire galaxy (“our galaxy itself contains a hundred billion stars...”), and its usefulness in absolute terms is very large.

So there’s a hidden step where you assume that the AIs that take over have diminishing returns

No, I’m not assuming diminishing returns or a preference for variety, when I talk about sufficient caring to result in a “extremely small amount of motivation” I just mean “caring enough to allocate ~1 / billionth to 1 / trillionth of our resources on it”. This is a pretty natural notion of slight caring IMO. I agree 1 / billionth of resources will be absolutely very large to an AI and there is a type of caring which gets divided out by vast cosmic scale.

IMO, it’s pretty natural for slight caring to manifest in this way and the thing you describe with peaches and apples is more like “slight preferences”. I’m not saying “the AI will slightly prefer humans to survive over humans not surviving (but not more than what it could do with a marginal galaxy)”, I mean that it will actually pay small costs at the margin of the actual decisions it has to make. I agree that AIs might prefer to preserve earth, but have this small preference swamped by the vastness of the galatic resources they would have to give up in exchange. I just wouldn’t call this an amount of caring which suffices for paying 1 / billionth of resources.

I agree that the way this manifests will depend on the motivational structure of the AI, and motivational structures which purely care about the the ultimate arrangement of matter without regard to anything else will end up not caring at all about keeping humans alive (after all, why privilege humans, they just happened to be using the matter/energy to start with). (In the same way that these motivations would kill aliens to tile that matter with something that seems more optimal even though keeping these aliens alive might be insanely cheap as a fraction of resources.) Humans demonstrate motivations which don’t just care about the ultimate arrangement of matter all the time, e.g. even relatively utilitarian people probably wouldn’t kill everyone on earth to get an additional galaxy or kill aliens to get their stuff when it would be this cheap to leave them alone.

I think the relevant type of “kindness” is pretty natural, though obviously not guaranteed.

If small motivations do matter, I think you can’t discount “weird” preferences to do other things with Earth than preserve it.

I don’t discount this, see discussion in the section “How likely is it that AIs will actively have motivations to kill (most/many) humans”. Note that it’s pretty cheap to keep humans alive while also doing something that destroys earth.
- Charlie Steiner 24 Sep 2025 6:29 UTC
  9 points
  6
  Parent
  This is a pretty natural notion of slight caring IMO
  Agree to disagree about what seems natural, I guess. I think “slight caring” being relative more than absolute makes good sense as a way to talk about some common behaviors of humans and parliaments of subagents, but is a bad fit for generic RL agents.
- johnswentworth 7 Oct 2025 16:54 UTC
  7 points
  2
  Parent
  It sounds like you are not not claiming that superintelligence will have human-like scope insensitivity baked into its preferences? Which seems like an absolutely bonkers thing to claim. “1 billionth of resources” does not at all seem like a natural way for “slight caring” to manifest in an actually-advanced mind; it seems like a thing which very arguably occurs in human minds but is particularly unlikely to generalize to superintelligence precisely because the generalized version would kneecap many general capabilities quite badly.
  - ryan_greenblatt 7 Oct 2025 18:00 UTC
    2 points
    1
    Parent
    
    It sounds like you are not not claiming that superintelligence will have human-like scope insensitivity baked into its preferences?
    
    I think it’s plausible ASI will have preferences which aren’t totally linear returns-y and/or don’t just care about the final arrangement of matter. These preferences might be very inhuman. Perhaps you think it’s highly over determined that actually-advanced minds would only care about the utimate arrangement of matter at cosmic scales in a linear-ish way, but I don’t think this is so obvious.
- ryan_greenblatt 24 Sep 2025 4:46 UTC
  7 points
  0
  Parent
  Another way to put this: some types of caring are the types of caring where you’re willing to allocate 1 / billionth of your resources on it even if you learn that you are extremely rich (so these resource could buy a lot of what you want in absolute terms). But you can also have preferences for what to spend your money on which depend strongly on what options are available; you’d be willing to trade some absolute amount of value for the thing and if it ends up being that your wealthier and can buy more in absolute terms, you’d be willing to spend a smaller fraction of money on the thing.
  
  I’m predicting a type of caring/kindness which is the first rather than a type of preference which cares about questions like “exactly how big is the cosmic endowment”.