Bostrom’s Footnote 21 seems innocuous, but to me it unravels a lot of the argument Bostrom is making.[1]
Bostrom’s central go/no-go model suggests a P(doom) of up to 97% is acceptable if life expectancy rises to 1,400 years post-AGI.
Footnote 21 clarifies, ``more generally, we could take P(doom) to be the expected fraction of the human population that dies when superintelligence is launched.″
So, suppose one could extend life 1,400 years for 3% of humans at the cost of killing 97% right now. How should we reply to this deal? Bostrom’s go/no-go model says to accept, but I think people would overwhelmingly find the deal morally reprehensible.
It’s straightforward to observe that the two deals, Bostrom’s and mine, are mathematically identical for the go/no-go model, which assumes as person-affecting framework. But I think something more subtle happens if we step slightly outside the person-affecting framework.
If you have even slight impersonal preferences, I think that you should decide Bostrom’s deal is strictly worse than mine. Therefore, I think that you should support Bostrom’s deal only if (a) you solely follow selfish motivations (which most people think is bad) or (b) you support my deal (and my deal seems horrible).
Why is Bostrom’s deal strictly worse than mine, if you have slight impersonal preferences? Under Bostrom’s deal, in 97% of cases everyone is dead. Under my deal the surviving 3% could have further generations who continue aspects of life we value, potentially flourishing for a long time.
One could disagree and propose ways my deal seems worse than Bostrom’s, but I think my deal can simply be modified to accommodate until my deal is better. For example, if you feel its more fair for everyone to face the same risk, suppose the 3% are selected by random sampling in my deal. If you favor Bostrom’s deal because it doesn’t split families and friends across living vs. killed, suppose the sampling in my deal is by large clusters of families and friends. If you favor Bostrom’s scenario because of non-lifespan ASI benefits, suppose an ASI confers those same benefits to the surviving 3% in my deal. Wherever you end up, my deal is still killing 97% of people to benefit 3%, so my deal is still horrible. Yet, Bostrom’s deal remains worse.
I am sympathetic that Bostrom set out reasonably to investigate ASI within a person-affecting framework. Unfortunately—and irrespective of Bostrom’s initial intentions—I think that his article has mainly turned out to be a list of justifications for the small, probably malevolent group of people who would accept a deal that kills 97% of humans to extend the lives of 3%.
The go/no-go model is not meant to show that a P(doom) of up to 97% is “acceptable” (or at least it would risk being highly misleading to say that). The model is only meant to show that up to that level of risk, launching superintelligence increases life expectancy under the given assumptions. That model ignores many important factors (such as distributional considerations and diminishing marginal utility in QALYs), which is why a series of more complicated models are introduced that take into account some of these other factors. (Even the most elaborate of the models introduced is still only very schematic and leaves out much that is relevant, as all formal models of this sort do. “For these and other reasons, the preceding analysis—although it highlights several relevant considerations and tradeoffs—does not on its own imply support for any particular policy prescriptions.”).
By the way, there may also be reasons to regard implementing a lottery that would involve going out and killing some random subset of the human population differently from allowing technological progress to continue—even if we were to stipulate that the two cases were exactly parallel with respect to some set of consequentialist outcome metrics. (Also, while in your example, using randomization would equalize people’s chances or ex ante expected lifespans, it would lead to radically uneven ex post outcomes. Some people with egalitarian intuitions care about inequality of outcomes, not only inequality of chances or opportunities—especially in cases where the inequality of outcomes is not connected to personal motivations, efforts, or choices.)
The go/no-go model is not meant to show that a P(doom) of up to 97% is “acceptable”
I should have been clearer, yes. I meant that the 97% deals are acceptable to your go/no-go model, not to you or your later models.
However, I think my arguments apply equally to your later models, just with P(doom) different from 97%. (See below.)
a series of more complicated models are introduced that take into account some of these other factors.
Thank you for running the more complicated models. (And, in case unclear, I did read all your article before making any comments.)
Do the models help us understand how we should act? Here is how I look at it --
It’s difficult to get intuition about how good or bad the optimal ASI launch times are, because the envisioned situation is so far from experience.
In each model, there is a remaining P(doom) at the model-optimal launch time, call it R%. R% is sometimes large, sometimes small.
From a perspective that stays entirely within the specific person-affecting framework used by the models, the following deals have equal value
Deal 1: At ASI launch, all humans are killed with R% probability, but life expectancy becomes superhuman with probability 100% - R%.
Deal 2: At ASI launch, R% of humans are killed with certainty, which is the cost of providing superhuman life expectancy for the remaining 100% - R%.
What happens if we put one foot outside model assumptions and begin to care some about future generations? Are lessons of the models robust to a small step away from their assumptions?
If we care about future generations, this tiebreaks. Deal 1 seems worse than Deal 2, since under Deal 1 there may easily be no future generations, while under Deal 2 future generations can repopulate, have happiness, and so on. Even if one believes Deal 1 is not strictly worse than Deal 2, Deal 1 hardly seems much better.
Let’s now return to viewing things however we actually see them, without adopting a specific framework whose rules we restrict ourselves to follow. It’s difficult for me to get an intuitive grasp on Deal 1, but Deal 2 is easier to understand. There are a lots of similar historical precedents. Deal 2 is explicitly killing a large proportion of humans to let the others prosper more. It’s horrible.
We have seen that stepping even slightly outside the model’s person-affecting stance, Deal 1 is worse than Deal 2. Yet, Deal 2 is plainly awful. So what is the value of simulating optimal times for models to accept Deal 1? It is like modeling the ″best″ time to enact horror.
I should mention that the models often find high R% to be optimal. For example, with 50% initial P(doom) and 5%/yr safety progress, the model you used in Table 3 would launch when P(doom) remains 32% with overall life expectancy of 774 years, instead of waiting a handful of decades for P(doom) to fall to near 0. I noticed this when I coded your Appendix A model for the sensitivity analysis in my other comment (which also found a small error in your Table 3 -- see that comment for detail.)
Some people …. care about inequality of outcomes, not only inequality of chances
I neglected this, thank you for raising it. I don’t think it undermines my argument, but let me know if you disagree.
Infinite regress issue?
I’m apologize if you addressed this somewhere or if I misunderstand, but is there an infinite regress issue with your models?
Take three times, T0, T1, and T2, each a year apart. Let average life expectancy be 40 years among the living at each time. Suppose the model is run at T0 and it says “Delay ASI launch 3 years”. However, if the model is run again at T1, it will still say “Delay ASI launch 3 years” because the model explicitly only cares about the living. And the same at T2 or later. So one would never reach the time to launch ASI.
Bostrom’s Footnote 21 seems innocuous, but to me it unravels a lot of the argument Bostrom is making.[1]
Bostrom’s central go/no-go model suggests a P(doom) of up to 97% is acceptable if life expectancy rises to 1,400 years post-AGI.
Footnote 21 clarifies, ``more generally, we could take P(doom) to be the expected fraction of the human population that dies when superintelligence is launched.″
So, suppose one could extend life 1,400 years for 3% of humans at the cost of killing 97% right now. How should we reply to this deal? Bostrom’s go/no-go model says to accept, but I think people would overwhelmingly find the deal morally reprehensible.
It’s straightforward to observe that the two deals, Bostrom’s and mine, are mathematically identical for the go/no-go model, which assumes as person-affecting framework. But I think something more subtle happens if we step slightly outside the person-affecting framework.
If you have even slight impersonal preferences, I think that you should decide Bostrom’s deal is strictly worse than mine. Therefore, I think that you should support Bostrom’s deal only if (a) you solely follow selfish motivations (which most people think is bad) or (b) you support my deal (and my deal seems horrible).
Why is Bostrom’s deal strictly worse than mine, if you have slight impersonal preferences? Under Bostrom’s deal, in 97% of cases everyone is dead. Under my deal the surviving 3% could have further generations who continue aspects of life we value, potentially flourishing for a long time.
One could disagree and propose ways my deal seems worse than Bostrom’s, but I think my deal can simply be modified to accommodate until my deal is better. For example, if you feel its more fair for everyone to face the same risk, suppose the 3% are selected by random sampling in my deal. If you favor Bostrom’s deal because it doesn’t split families and friends across living vs. killed, suppose the sampling in my deal is by large clusters of families and friends. If you favor Bostrom’s scenario because of non-lifespan ASI benefits, suppose an ASI confers those same benefits to the surviving 3% in my deal. Wherever you end up, my deal is still killing 97% of people to benefit 3%, so my deal is still horrible. Yet, Bostrom’s deal remains worse.
I am sympathetic that Bostrom set out reasonably to investigate ASI within a person-affecting framework. Unfortunately—and irrespective of Bostrom’s initial intentions—I think that his article has mainly turned out to be a list of justifications for the small, probably malevolent group of people who would accept a deal that kills 97% of humans to extend the lives of 3%.
Always read the footnotes!
The go/no-go model is not meant to show that a P(doom) of up to 97% is “acceptable” (or at least it would risk being highly misleading to say that). The model is only meant to show that up to that level of risk, launching superintelligence increases life expectancy under the given assumptions. That model ignores many important factors (such as distributional considerations and diminishing marginal utility in QALYs), which is why a series of more complicated models are introduced that take into account some of these other factors. (Even the most elaborate of the models introduced is still only very schematic and leaves out much that is relevant, as all formal models of this sort do. “For these and other reasons, the preceding analysis—although it highlights several relevant considerations and tradeoffs—does not on its own imply support for any particular policy prescriptions.”).
By the way, there may also be reasons to regard implementing a lottery that would involve going out and killing some random subset of the human population differently from allowing technological progress to continue—even if we were to stipulate that the two cases were exactly parallel with respect to some set of consequentialist outcome metrics. (Also, while in your example, using randomization would equalize people’s chances or ex ante expected lifespans, it would lead to radically uneven ex post outcomes. Some people with egalitarian intuitions care about inequality of outcomes, not only inequality of chances or opportunities—especially in cases where the inequality of outcomes is not connected to personal motivations, efforts, or choices.)
Thank you for your response!
I should have been clearer, yes. I meant that the 97% deals are acceptable to your go/no-go model, not to you or your later models.
However, I think my arguments apply equally to your later models, just with P(doom) different from 97%. (See below.)
Thank you for running the more complicated models. (And, in case unclear, I did read all your article before making any comments.)
Do the models help us understand how we should act? Here is how I look at it --
It’s difficult to get intuition about how good or bad the optimal ASI launch times are, because the envisioned situation is so far from experience.
In each model, there is a remaining P(doom) at the model-optimal launch time, call it R%. R% is sometimes large, sometimes small.
From a perspective that stays entirely within the specific person-affecting framework used by the models, the following deals have equal value
Deal 1: At ASI launch, all humans are killed with R% probability, but life expectancy becomes superhuman with probability 100% - R%.
Deal 2: At ASI launch, R% of humans are killed with certainty, which is the cost of providing superhuman life expectancy for the remaining 100% - R%.
What happens if we put one foot outside model assumptions and begin to care some about future generations? Are lessons of the models robust to a small step away from their assumptions?
If we care about future generations, this tiebreaks. Deal 1 seems worse than Deal 2, since under Deal 1 there may easily be no future generations, while under Deal 2 future generations can repopulate, have happiness, and so on. Even if one believes Deal 1 is not strictly worse than Deal 2, Deal 1 hardly seems much better.
Let’s now return to viewing things however we actually see them, without adopting a specific framework whose rules we restrict ourselves to follow. It’s difficult for me to get an intuitive grasp on Deal 1, but Deal 2 is easier to understand. There are a lots of similar historical precedents. Deal 2 is explicitly killing a large proportion of humans to let the others prosper more. It’s horrible.
We have seen that stepping even slightly outside the model’s person-affecting stance, Deal 1 is worse than Deal 2. Yet, Deal 2 is plainly awful. So what is the value of simulating optimal times for models to accept Deal 1? It is like modeling the ″best″ time to enact horror.
I should mention that the models often find high R% to be optimal. For example, with 50% initial P(doom) and 5%/yr safety progress, the model you used in Table 3 would launch when P(doom) remains 32% with overall life expectancy of 774 years, instead of waiting a handful of decades for P(doom) to fall to near 0. I noticed this when I coded your Appendix A model for the sensitivity analysis in my other comment (which also found a small error in your Table 3 -- see that comment for detail.)
I neglected this, thank you for raising it. I don’t think it undermines my argument, but let me know if you disagree.
Infinite regress issue?
I’m apologize if you addressed this somewhere or if I misunderstand, but is there an infinite regress issue with your models?
Take three times, T0, T1, and T2, each a year apart. Let average life expectancy be 40 years among the living at each time. Suppose the model is run at T0 and it says “Delay ASI launch 3 years”. However, if the model is run again at T1, it will still say “Delay ASI launch 3 years” because the model explicitly only cares about the living. And the same at T2 or later. So one would never reach the time to launch ASI.