RogerDearnaley comments on An Explication of Alignment Optimism

RogerDearnaley 31 Jan 2026 23:49 UTC
13 points
9
“The MIRI types” were very explicit that they were doing security mindset thinking, trying to think about all the things that could possibly go wrong, in advance. This is entirely appropriate and reasonable when not only 8.3 billion lives, but also all their descendants for however long and far the human race would otherwise have got (at least several orders of magnitude more, quite possibly astronomically more), are on the line.
However, the most likely result of thinking long and hard about everything that could possible go wrong and then publicly posting long lists of them (if you do it right), is that, actually, nature doesn’t throw you quite that many curveballs — and then people get surprised by why things aren’t turning out as badly as MIRI were concerned they might. Now, they did miss one or two (jailbreaks, for example: almost no-one saw that coming before it happened), but in general they managed to think of almost everything Reality has actually thrown at us, plus quite a bit more — and that was the goal.
I’m happy Reality hasn’t been that sadistic with us so far. I try to remember this when updating my P(DOOM). But I’m still not going to rely on this going forwards — complacency is not an appropriate response to existential risk.
- Oliver Daniels 1 Feb 2026 4:37 UTC
  1 point
  0
  Parent
  yeah I am very grateful for MIRI, and I don’t think we should be complacent about existential risks (e.g. 50% P(doom) seems totally reasonable to me)
  - RogerDearnaley 1 Feb 2026 16:47 UTC
    2 points
    2
    Parent
    I think people should be jumping up and down and yelling if their P(DOOM) is even 0.01%. Extinction is far worse than just killing almost all of the 8.2 billion of us: it also destroys all our potential descendants. Even if you were absolutely certain that we were never going to go to the stars, the average mammalian species lasts O(1million years), i.e O(10,000) lifetimes. So a 0.01% P(DOOM) is at least as bad as a certainty of killing almost everyone alive now (but leaving a few to rebuild) — or astronomically worse if there’s any chance of us going to the stars and multiplying the loss astronomically by the volume of our forward lightcone.
    
    There is VAST difference in severity between merely “kill almost everyone” and “make the human species extinct”. Like, at a minimum, 4 orders of magnitude, and quite possibly something more like 10-15 orders of magnitude. (That’s why the DOOM is in capital letters.)
    (This is a point that I think some people outside the LW/EA community miss about MIRI — they’re techno-optimists. They confident we’re going to the stars, if we can just avoid killing ourselves first. That’s why they keep talking about lightcones. So they believe it’s at least 10^~14 times as bad, not 10^4: they also multiplying by the number of habitable planets in the galaxy, and allowing for the fact that a more widespread species are less likely to get all wiped out so likely to last longer. Can’t say as I disagree with them, and even if you think the chance of us going to the stars is only, say, 0.1%, that still utterly dominates the calculation.)
    [P.S. Footnote added since Eli Tyre questioned my MIRI-hatted very-rough Fermi estimate: O(10^11) stars in the galaxy, say O(0.1%) have a habitable planet, but being widely distributed we last O(100) times as long = extra factor of O(10^~10). Yes, I am ignoring Dyson Swarms, Grabby Aliens, and a lot of other possibilities.]
    - Sean Herrington 1 Feb 2026 23:09 UTC
      8 points
      7
      Parent
      I think there’s an implicit assumption of tiny discount factors here, which are probably not held by the majority of the human population. If your utility function is such that you care very little about what happens after you die, and/or you mostly care for people in your immediate surroundings, your P(DOOM) needs to be substantially higher for you to start caring significantly.
      
      This is not to mention Pascal’s mugging type arguments, where you should be unconvinced to make significant life choices from an unconvincing probability of some large thing.
      
      This is not to say that I’m against x-risk research – my P(DOOM) is about 60% or so. This is more just to say that I’m not sure people with a non-EA worldview should necessarily be convinced by your arguments.
      - RogerDearnaley 1 Feb 2026 23:44 UTC
        2 points
        0
        Parent
        Discount factors are a cheap stand-in for three effects, none of which apply to P(DOOM):
        a) difficulty of predicting the future. That extinction is forever is not a difficult prediction. (In other news, Generalissimo Francisco Franco is still dead.)
        b) someone closer to the time (possibly even me) may handle that. But not if everybody is dead.
        c) GPD growth rates. Which are zero if everybody is dead.
        (Or to quote a bald ASI, even three million years into the future it remains true that: Everybody is dead, Dave.)
        But yes, I should have pointed out that in this particular case, the normal assumption that you can safely ignore the far future and it will take care of itself does not apply.
        Sean Herrington 2 Feb 2026 0:59 UTC
        8 points
        7
        Parent
        Hmm, perhaps. My intuition behind discount factors is different, but I’m not sure it’s a crux here. I agree that extinction leads to 0 utility for everyone everywhere, but the point I was making was more that with low discount factors the massive potential of humanity has significant weight, while a high discount factor sends this to near 0.
        
        In this worldview, near-extinction is no-longer significantly better than extinction.
        
        That aside, I think the stronger point is that if you only care about people near to you, spatially and temporally (as I think most people implicitly do), the thing you end up caring about is the death of maybe 10 − 1000 people (discounted by your familiarity with them, so probably at most equivalent to ~100 deaths of nearby family) rather than 8000000000.
        Some napkin maths as to how much someone with that sort of worldview should care: a 0.01% chance of doom in the next ~20 years then gives ~1% of an equivalent expected death in the next 20 years. 20 years is ~17 million hours, which would make it about 7.5x less worrisome than driving according to this infographic.
        Again, very napkin maths, but I think my basic point is that a 0.01% P(Doom) coupled with a non-longtermist, non-cosmopolitan view seems very consistent with “who gives a shit”.
        RogerDearnaley 2 Feb 2026 1:17 UTC
        2 points
        0
        Parent
        Such a person is very badly miscalculating their evolutionary fitness — but then, what else is new?
        Sean Herrington 2 Feb 2026 1:33 UTC
        8 points
        7
        Parent
        Number of relations grows exponentially with distance, genetic relatedness grows with log of distance, so assume you have e.g 1 sibling, 2 cousins, 4 second cousins etc, each layer will have an equivalent fitness contribution. log2(8 billion) = 33. Fermi estimate of 100 seems around right?
        
        If anything, I get the impression this is overestimating how much people actually care, because there’s probably an upper bound somewhere before this point.
        RogerDearnaley 2 Feb 2026 1:38 UTC
        2 points
        0
        Parent
        If your species goes extinct, you genetic fitness just went to 0, along with everyone else’s. Species-level evolution is also a thing.
        Sean Herrington 2 Feb 2026 1:47 UTC
        1 point
        0
        Parent
        Is the implication here that you should also be caring about genetic fitness as carried into the future? My basic calculation here was that in purely genetic terms, you should care about the entire earth’s population ~33x as much as a sibling (modulo family trees are a bunch messier at this scale, so you probably care about it more than that).
        
        I feel like at this scale the fundamental thing is that we are just straight up misaligned with evolution (which I think we agree on).
        RogerDearnaley 2 Feb 2026 1:49 UTC
        4 points
        0
        Parent
        Indeed. I’m enough of a sociobiologist to sometimes put some intellectual effort into trying to be aligned with evolution, but I attempt not to overdo it.
        tslarm 2 Feb 2026 4:32 UTC
        1 point
        0
        Parent
        Far more likely, they’re not calculating their evolutionary fitness at all. Our having emotions and values that are downstream of evolution doesn’t imply that we have a deeper goal of maximising fitness.