Guys I might be an e/acc
I read If Anyone Builds It, Everyone Dies (IABIED) and nodded along like everyone else, mostly agreeing with the argument but having minor quibbles about the details or the approach. However, I was recently thinking, “how in support of an AI pause am I, actually?” The authors of IABIED were pretty convincing, but I also know I have different estimates of AI timelines and p(doom) than the authors do. Given my own estimates, what should my view on an AI pause be?
I decided to do some rough napkin math to find out.
A (current number of deaths per year): 60 million
B (guess for years until AGI, no pause): 40 years
C (pause duration, let’s say): 10 years
D (years until AGI, given a pause): B + C = 50 years
E (guess for p(doom), given no pause): 10%
F (guess p(doom) given a pause): 5%
G (current world population, about): 8 billion
H (deaths before AGI, given no pause): A * B = 2.4 billion
I (expected deaths from doom, given no pause): E * G = 800 million
J (total expected deaths, given no pause): H + I = 3.2 billion
K (deaths before AGI, given a pause): A * D = 3 billion
M (expected deaths from doom, given a pause): F * D = 400 million
N (total expected deaths, given a pause): K + M = 3.4 billion
P (additional expected deaths from pausing): N—J = 200 million
Q (additional chance of humanity ceasing to exist if we don’t pause): E—F = 5%
If we pause AI, then based on my estimates, we’ll see an extra 200 million deaths, and the chances of humanity ceasing to exist is halved from 10% to 5%. Is that worth it? That depends on your values.
Let’s say I found out 200 million people were going to die from some kind of meteor strike or something. What lengths would I go to in order to convince people of this? I think the project would become my life. I would be writing essays, calling the news, calling the president – screaming from the mountaintop. If I thought my actions could even reduce the chances of this happening by 1%, I would do it.
Now let’s say I found out in 200 years, humanity was going to end. For some complex reason, humanity would become infertile in 200 years, and humanity would end. Some kind of quirk in our DNA, or some kind of super virus. Or, let’s say they just start having robot babies instead of human babies, because they’re somehow more appealing, or human embryos are secretly replaced by robot embryos by some evil cabal. Anyway, it’s a situation we can only prevent if we start very soon. But let’s say I knew the humans at the end of humanity would be happy. Their non-conscious robots babies acted just like human babies, seemed to grow into people, etc. But once the final human died, the robots would power down and the universe would go dark. I’m trying to create a hypothetical where we have to consider the actual value of humanity having a future irrespective of the suffering and death that would normally accompany humanity coming to an end. Let’s say I’m convinced this is going to happen, and thought if I dedicated my life to stopping this, I could reduce the chances of it happening by 1%. Would I do it?
No way. Maybe the pure ego of being the first person to discover this fact would drive me to write a few essays about it, or even a book. But the eventual extinguishment of humanity just wouldn’t be important enough for me to dedicate my life to. It’s not that I don’t care at all, I just mostly don’t care.
For me, I have empathy and therefore want people who exist (or who will exist) to be happy, not suffer, and stay alive. I don’t care about tiling the universe with happy humans. When people place some insanely high value on humanity existing millions of years into the future, that seems to me to be the output of some funny logical process, rather than an expression of one’s actual internal values.
Let’s do some more napkin math and see how this relates to an AI pause.
R (amount of effort I’d spend to reduce chances of P people dying by 1%): 1000 arbitrary effort units
S (amount of effort I’d spend to reduce chances of humanity gracefully petering out by 1%): 10 arbitrary effort units
T (amount of effort I’d spend to avoid the negative outcomes of pausing AI): R * 100 = 100,000 arbitrary effort units
U (amount of effort I’d spend to avoid the negative outcomes of NOT pausing AI): S * Q = 50 arbitrary effort units
V (is an AI pause favorable?): is U greater than T? Nope.
So I do not favor an AI pause, according to this math. But I wouldn’t really say I’m “for” or “against” a pause, because my position isn’t confident enough to take a strong position. The math inherits the natural uncertainty in my underlying guesses, and also made a lot of assumptions. The idea was just to put down on paper what I believe and see roughly what the natural consequences of those beliefs might be, rather than just passively absorbing an attitude toward AI from my environment.
There were plenty of assumptions here to simplify things, including: I assumed the population won’t increase, that the number of deaths per year will be relatively constant until AGI, that the AGI pause duration will be 10 years, that capabilities won’t increase during the pause at all (even theoretical research), that AI kills everyone instantly or not at all, and I didn’t really factor in suffering directly, just used death as a proxy.
There may also be factors you think are important I didn’t include, like the inherent value of non-human (AI) life, the inherent value of animal life/suffering, etc. So feel free to create your own version.
Whether or not you favor a pause might come down to how much you value the lasting future of humanity. Or if you have IABIED-like timelines and p(doom), then there may be a clear case for a pause even in terms of human lives.
I had Claude create a calculator version of my napkin math, so you can try entering your own assumptions into the calculator to see whether you’d be for or against an AI pause. Try it here. (You should choose a negative R value if P is negative!)
Doesn’t your way of calculating things suggest that, if you had the chance to decide between two outcomes:
Everybody dies instantly now
Everybody dies instantly in 100 years
You’d choose the former because you’d end up at a lower number of people dying?
Imaging pausing did not change p(doom) at all and merely delays inevitable extinction by 10 years. To me that would still be a no brainer—rather have 10 more years. To you, does that really only boil down to 600 million extra deaths and nothing positive, like, say, 80 billion extra years of life gained?
I’m happy to bite this bullet.
Setting aside the fact that I personally fear death. So let’s imagine we’re talking about the universe ending either in 100 or 200 years (and population keeps growing during that time). I guess I would prefer the former, yes.
More people experiencing some horrible apocalypse and having their lives cut short sounds bad to me. Pausing would mean more years for the people who exist at the 100 year mark, but then you’re also creating more people later who will have their lives tragically cut short. How many young people would it be acceptable to create and destroy in a fiery apocalypse later, to give people who exist now more years?
If this factor is important to you, I encourage you to do your own napkin math that takes it into account! I don’t want anyone to think I’m trying to publish an objectively correct AI pause calculator. I’m just trying to express my own values on paper and nudge others to do the same.
In the post though, you wrote:
So if you’re still biting the bullet under these conditions, then I don’t really get why—unless you’re a full-on negative utilitarian, but then the post could just have said “I think I’m e/acc because that’s the fastest way of ending this whole mess”. :P
I mean, that’s fine and all, but if your values truly imply you prefer ending the world now rather than later, when these are the two options in front of you, then that does some pretty heavy lifting. Because without this view, I don’t think your other premises would lead to the same conclusion.
If we assume roughly constant population size (or even moderate ongoing growth) and your assumption holds that a pause reduces p(doom) from 10 to 5%, then far fewer people will die in a fiery apocalypse. So however we turn it, I find it hard to see how your conclusion follows from your napkin math, unless I’m missing something. (edit: I notice I jumped back from my hypothetical scenario to the AGI pause scenario; bit premature here, but eventually I’d still like to make this transition, because again, your fiery apocalypse claim above would suggest you should rather be in favor of a pause, and not against it)
(I’d also argue that even if the math checks out somehow, the numbers you end up with are pretty close while all the input values (like the 40 year timeline) surely have large error bars, where even small deviations might lead to the opposite outcome. But I notice this was discussed already in another comment thread)
Oh, yeah, I got confused. I originally wrote the post taking into account a growing population, but removed that later to make it a bit simpler. Taking into account a growing population with an extra 1 or 2 billion people, everyone dying later is worse because it’s more people dying. (Unless it’s much later, in which case my mild preference for humanity continuing kicks in.) With equal populations, if everyone dies in 100 or 200 years it doesn’t really matter to me, besides a mild preference for humanity continuing. But it’s the same amount of suffering and number of lives cut short because of the AI apocalypse.
I think that I’d do this math by net QUALYs and not net deaths. My guess is doing it that way may actually change your result.
I’m not trying to avoid dying; I’m trying to steer toward living.
I agree with your general idea of not caring much about the abstract notion of future potential people, but still think your numbers are so approximate as to be useless, especially if you consider how small the margin you get is (a 200 million difference in a ~3.5 billion number, that’s about a 6% margin—tiny mistakes in estimation could push the outcome the other way).
Problems I have with your model: you discount entirely suffering risks (by your own admission) which IMO cover a lot of possible bad AI futures, or generally all the suffering on the path to doom besides just the deaths (this works for fast takeoff foom but I don’t think that is the likely mode of doom). But also, are you just considering that upon inventing AGI, death is immediately solved and everyone becomes immortal? That seems a huge stretch to me. In most scenarios other than “AGI fooms to ASI overnight, but it’s like, aligned and good” there is a long-ish transition period before developments that outlandish.
I agree that the numbers are so approximate as to be relatively useless. I feel like the useful part of this exercise for me was really in seeing how uncertain I am about whether or not we should have an AI pause. Relatively small differences in my initial assumptions could sway the issue either way. It’s not as if the cautious answer is obviously to pause, which I assumed before. Right now I’m extremely weakly against.
Yes, I am assuming, mostly for the sake of simplicity, that superintelligent AGI cures mortality immediately. I don’t think it would be likely to take more than 10 years though, which is why I’m comfortable with that simplification. I’m also comfortable using deaths as a proxy for suffering because I don’t expect a situation where the two diverge, e.g. an infinite torture torment nexus scenario.
Even without divergence, a few decades of suffering could be enough to move such a close calculation. Nor I’m so sure about the infinite torment nexus scenario (by your metric, even just “the AI keeps human society alive but in a bad state and without giving anyone immortality” would be this).
I also think the immortality expectation is wildly ungrounded. I can’t think of how even a superintelligent AI would cure mortality other than maybe uploads, which I doubt are possible. And anyway if all you count is deaths… everyone dies in the end, at some point. Be it the Sun going red giant or the heat death of the universe. So I’d say considering how good their lives have been until that point seems paramount.
Honestly, I’m not even sure we can call any of this a calculation, given the uncertainty. It just seems like a bunch of random guesswork. The main thing I’m learning from all this is how uncertain I am, and how skeptical of anyone who claims to be more certain.
I think it shouldn’t be hard to believe how a superintelligent AI could cure mortality. For example, it could quickly cure all diseases and biological aging, and dramatically reduce the incidence of accidents. Then we have lifespans of like 10,000 years, and that’s 10,000 years for the superintelligent AI to become even more superintelligent and figure something out.
I agree that everyone dies at some point, but if that happens in a trillion years, presumably we’ll at least have figured out how to minimize the tragedy and suffering of death, aside from the nonexistence itself.
I agree that accounting for suffering could possibly make a difference, but that sounds harder than just estimating deaths and I’m not sure how to do it. I’m pretty sure it will shift me further against a pause though. A pause will create more business-as-usual suffering by delaying AGI, but will reduce the chances of doom (possibly). I don’t expect doom will involve all that much suffering compared to a few decades of business-as-usual suffering, unless we end up in a bad-but-alive state, which I really doubt.
That’s mostly just life extension. There would still be plenty of potential for death, and I’m not sure whether e.g. stopping aging would also save your brain from all forms of decay. Besides, that kind of knowledge takes experimentation—even an ASI can’t possibly work everything out purely from first principles. And human experimentation being ethical (which hopefully an aligned ASI would be worried about, otherwise we’re well and screwed) is a big bottleneck in finding such things out. It would at least slow the discovery a bit.
I don’t see why that would be the case. I think you’re too focused on an ASI singleton fooming and destroying everyone overnight as your doom scenario. A more likely doom scenario is: AGI gets invented. Via regular economic incentive, it slowly prices all humans out of labour, leading to widespread misery only partially mitigated by measures such as UBI, if even those are passed at all. Power and control gets centralised enormously in the hands of those who own the AIs (AI CEOs and such). The economy gets automated, and eventually more and more executive decisions are delegated to ever smarter AGIs. At some point this completely spins out of control—the AIs aren’t well-aligned, so they start e.g. causing more and more environmental degradation, building their own defences, and so on so forth. Then humanity mostly ends either by the environment not supporting life any more, or in a last fight to try and desperately gain control back. A few leftovers may survive (the descendants of the original AI owners), if they managed to at least align the AIs that much, within protected environments, completely disempowered.
What would you rate such a future at? Lots of deaths, not necessarily complete extinction, but also lots of suffering on the road. And I would honestly say this is my most likely bad outcome right now.
Honestly, the more I engage with this thread, the less certain I become that any of this conversation is productive. Yeah, that’s one way the future could go. It feels less like discussing whether a potential drug will be safe or not, and more like discussing how many different types of angels there will turn out to be in heaven. There’s just such little information going into this discussion that maybe the conclusion from all this is that I am just unsure.
What about all the future people that would no longer get a chance to exist—do they count? Do you value continued existence and prosperity of human civilization above and beyond the individual people? For me, it’s a strong yes to both questions, and that does change the calculus significantly!
Also, is the calculator setting non-doom post-AGI mortality to zero by capping the horizon at AGI and counting only pre-AGI deaths?
For example: time to AGI|no pause = 10y & pause = 10y. Then the calculator will arrive 60m x 10 deaths for no-pause vs 60m x 20 for pause. But if after AGI mortality only halves, the fair comparison no-pause path should be 60m x 10 + 0.5 x 60m x 10.
I made the assumption that mortality is cured by superintelligent AGI if the AGI is aligned, yes.
Yes, quite weird to put zero value for people after AGI.
If you e.g. expect p(extinction|no pause)-p(extinction|pause) = 0.1%, and 1 trillion people after AGI, pausing saves a billion people after AGI in expectation.
I think it is definitely not a classical utilitarian view, but that doesn’t trouble me. If you are a classic utilitarian, you can always put the value of S super high.
To explain briefly why I don’t care about trying to create more people, I am motivated by empathy. I want people that exist to be doing well, but I don’t care about maximizing the number of people doing well very much. Utilitarianism seems to imply we should tile the universe with flourishing humans, but that doesn’t seem valuable to me. I don’t see every wasted sperm cell as some kind of tragedy. A future person who could have existed. I don’t think the empty universe before humanity came along was good or bad. I don’t think in terms of good or bad. Things just are, and I like them or I dont. I don’t like when people suffer or die. That’s it.
I was most confused about ‘S’, and likely understood it quite differently than intended.
I understood S as roughly “Humanity stays in control after AGI, but slowly (over decades/centuries) becomes fewer and less relevant”. I’d expect in many of these cases for something morally valuable to replace humans. So I put S lower than R.
Could it make sense to introduce “population after AGI you care about” as a term – I think this could be clearer.
I meant for this to be factored into the scenario by S. I almost don’t care at all about future people getting a chance to exist, so I put S very low. If you disagree, you’ll set S higher.
There was a thought experiment in the post to try to help you think about how much you care about humanity vs. individual humans that exist.
Have LW crowd ever adjusted for one thing that is common (I suppose) to majority of most active and established doomers here and elsewhere that makes their opinions so uniform—that is—they are all got successful and and important people, who achieved high fulfilment and (although not major factor) capital and wealth in this here present life of theirs? They all got a big lot of what to loose if perturbations happen. Never saw anything about this peculiar issue here on LW. Aren’t they all just scared to descend to the level of less fortunate majority and if that might be the only true reason for them being doomers? Oh this is so stupid, if it’s so—there will be no answer, only selective amnesia. Like Yudkovsky—who is he if AI is not going to kill its parents? In that case he’s nobody. No chance he’s even able to consider this—he’s life is a bet on him being somebody.
Does not sound plausible to me. If all worries about AI somehow magically disappeared overnight (God descends from Heaven and provides a 100% credible mathematical proof that any superhuman AI will necessarily be good), Yudkowsky would still be the guy who wrote the Sequences, founded the rationalist community, created a website where the quality of discourse is visibly higher than on the rest of the internet, etc. With the threat of AI out of the way, the rationalist community would probably focus again on developing the art of human rationality, increasing the sanity waterline, etc.
Also, your argument could be used to dismiss anything. Doctors talking about cancer? They just worry that if people are no longer afraid of diseases, no one will treat the doctors as high-status anymore. Etc.
I don’t really agree or like your comment very much, but I think buried underneath the negativity there is something valuable about making sure not to be too personally invested in what you believe about AI. You should be able to change your mind without that affecting your public reputation or whatever. It is possible some people have found meaning in the AI alignment mission, and if it turned out that mission was counterproductive, it may be hard for people to accept that.
nah I really wanna chill the fuck out, I’ve sacrificed a really high ratio of good things in my life to prevent the bad future, and I legitimately think even very shitty, ground-down-by-society lives in no-tech or oppressed areas are going to be worsened and then extinguished by AI on the default trajectory. I get why you’d wonder this and don’t think you’re a fool for thinking it, slightly irritated at your rudeness even though I get why you’d be rude in a comment like this. but it just doesn’t seem true to me. I turn down all sorts of opportunities to use my knowledge of AI to make a bunch of money fast, eg have nice job, or start another company. (I sold out of my last one at the minimum stock price the cofounder would accept when I got spooked by the idea of having it on my balance sheet, though in retrospect that was probably not a very productive thing to do in order to influence the world for the better.)
Yudkowsky has said similar things, someone donated enough money to him that he’s set for life, he has no financial incentive to push a worldview now, he’s literally just losing time he could spend on doing something more fun, and he does spend time on the more-fun things anyway.
It’s not a fun hobby. If you could give me real evidence that things are action-unconditionally fine, that we don’t have to work really hard to achieve even the P(doom) people already anticipate, then I’d be very excited to chill out and have a better paying job.
That said, yes, agree that the majority of your reasoning about how to make the world better should focus on people who are not in his situation, and that it’s slightly cringe to be worrying about really well-off people losing their wealth rather than how to make the world better for poor people and sick people and so on.
But mostly I expect that evolution does not favor nice things very hard, that it takes a while for it to get around to coughing up nice things, and that if you speed up memetic evolution a lot it mostly lets aggressive replicators win for a while until cooperation groups knit back together. seems like we—all humans and all current AIs—are ripe to be beaten by an aggressive replicator unless we get our act together hard about figuring out how to make durably defensible cooperative interactions, much more durable than has ever been achieved before.
Else your house (or datacenter) eventually gets paved over by a self-replicating factory, and before that, your mind gets paved over by a manipulative memeplex. fair to worry that any particular memeplex (including the alignment memeplex) might be doing this, though, there is very much a history of memeplexes saying “we’re the real cooperative memeplex” and turning out to have either lied or simply been wrong.
unconditional P(doom) is 10%?! no wonder your numbers are so weird!
my unconditional P(doom) is more like 60% at the moment. it’s down from P(doom|no ai safety community) which is around 99%. Seems like a bit much to expect that safety acceleration could pull doom down to 10%.
My p(doom) is around 10% probably in part because I imagine a pretty slow takeoff. I don’t think 60% is necessarily unreasonable, nor 1% nor 99%. It’s hard to estimate from essentially zero actual information.
Can I ask why you think the AI safety community has been/will be so impactful (99% → 60%)? I think you believe the community has much more reach and power than it actually does.
Technical research trajectory, mostly; I see paths through current technical alignment research which might be able to pull a rabbit out of a hat, camp A style. Also some chance of slowdown but most of my success probability comes from the possible futures where current technical research hunches pan out and let us know important attributes of a learning system that let us be sure that running it results in mostly-good outcomes for most minds’ preferences, in some cev-ish sense. Mostly this depends on wizard power, not command power.
What is there left to figure out, that would take so long?
I think trying to create intelligence via gradient descent is a dead end, so we’ll have to switch to an entirely different and more expensive architecture. I’ll probably write a post about that soon.