Charles Harding
charding
Thank you for explaining.
The societal readiness plot doesn’t seem to have a log-ish y axis, considering the shapes of the trend lines.
If the alignment plots were also drawn without a log-ish y axis, then they might look as bad as the societal readiness plot or—if not equally bad—then at least substantially worse than they do now.
I’m questioning plotting decisions for fake plots …I know. This may seem like splitting hairs. However, to me, there is a major difference between requiring linear vs.
exponentialpower law improvement in alignment, to take us to “what we need”.
The plots state that their x axes show capabilities on log scale, but what scales were intended for the y axes?
We might expect that the y axes are on linear (untransformed) scale. However, this would imply that multiplicative increases in AI capability can be addressed safely by making only additive amounts of progress in alignment (dashed green line on plots).
In general, multiplicative outmatches additive. How can we be confident that additive alignment progress would be enough?
Alternatively, we could view y axes as being on log scale. Yet, then the gap between “actual” (solid green line) and “what we need” (dashed green line) can be much larger than visually apparent on the plots, especially for dates in the future, leaving a big space for the region helpfully labeled, “A lot of bad shit can still happen [here].”
Many people may feel we are in your scenario A under a log y axis, with a big gap between “actual” and “what we need” appearing in future if current trends persist. In particular, people with fragile world concerns may place more “Everybody dies” dots in that gap, considering how large it could be.
Thank you for the plots! I hope I did not misinterpret them.
Thanks & no apology needed : )
Thank you to your dad for offering to answer questions.
Sometimes people make the argument that the U.S. needs to race toward AGI-ASI as rapidly as possible, because if China obtains it first, then the risks to the U.S. are unacceptably high. However, this argument can also be an appealing excuse for people in the U.S. who would wish to go full-speed toward AGI-ASI even if there were no competition from China.
I imagine similar arguments are also made in China, with the roles of the U.S. and China reversed.
Does your dad have thoughts about these kinds of arguments, considering that analogous arguments were made about the nuclear arms race? How does your dad think about the interaction of people who make these arguments genuinely vs. those who use these arguments as an excuse?
Thank you for your response!
The go/no-go model is not meant to show that a P(doom) of up to 97% is “acceptable”
I should have been clearer, yes. I meant that the 97% deals are acceptable to your go/no-go model, not to you or your later models.
However, I think my arguments apply equally to your later models, just with P(doom) different from 97%. (See below.)
a series of more complicated models are introduced that take into account some of these other factors.
Thank you for running the more complicated models. (And, in case unclear, I did read all your article before making any comments.)
Do the models help us understand how we should act? Here is how I look at it --
It’s difficult to get intuition about how good or bad the optimal ASI launch times are, because the envisioned situation is so far from experience.
In each model, there is a remaining P(doom) at the model-optimal launch time, call it R%. R% is sometimes large, sometimes small.
From a perspective that stays entirely within the specific person-affecting framework used by the models, the following deals have equal value
Deal 1: At ASI launch, all humans are killed with R% probability, but life expectancy becomes superhuman with probability 100% - R%.
Deal 2: At ASI launch, R% of humans are killed with certainty, which is the cost of providing superhuman life expectancy for the remaining 100% - R%.
What happens if we put one foot outside model assumptions and begin to care some about future generations? Are lessons of the models robust to a small step away from their assumptions?
If we care about future generations, this tiebreaks. Deal 1 seems worse than Deal 2, since under Deal 1 there may easily be no future generations, while under Deal 2 future generations can repopulate, have happiness, and so on. Even if one believes Deal 1 is not strictly worse than Deal 2, Deal 1 hardly seems much better.
Let’s now return to viewing things however we actually see them, without adopting a specific framework whose rules we restrict ourselves to follow. It’s difficult for me to get an intuitive grasp on Deal 1, but Deal 2 is easier to understand. There are a lots of similar historical precedents. Deal 2 is explicitly killing a large proportion of humans to let the others prosper more. It’s horrible.
We have seen that stepping even slightly outside the model’s person-affecting stance, Deal 1 is worse than Deal 2. Yet, Deal 2 is plainly awful. So what is the value of simulating optimal times for models to accept Deal 1? It is like modeling the ″best″ time to enact horror.
I should mention that the models often find high R% to be optimal. For example, with 50% initial P(doom) and 5%/yr safety progress, the model you used in Table 3 would launch when P(doom) remains 32% with overall life expectancy of 774 years, instead of waiting a handful of decades for P(doom) to fall to near 0. I noticed this when I coded your Appendix A model for the sensitivity analysis in my other comment (which also found a small error in your Table 3 -- see that comment for detail.)
Some people …. care about inequality of outcomes, not only inequality of chances
I neglected this, thank you for raising it. I don’t think it undermines my argument, but let me know if you disagree.
Infinite regress issue?
I’m apologize if you addressed this somewhere or if I misunderstand, but is there an infinite regress issue with your models?
Take three times, T0, T1, and T2, each a year apart. Let average life expectancy be 40 years among the living at each time. Suppose the model is run at T0 and it says “Delay ASI launch 3 years”. However, if the model is run again at T1, it will still say “Delay ASI launch 3 years” because the model explicitly only cares about the living. And the same at T2 or later. So one would never reach the time to launch ASI.
Bostrom’s Footnote 21 seems innocuous, but to me it unravels a lot of the argument Bostrom is making.[1]
Bostrom’s central go/no-go model suggests a P(doom) of up to 97% is acceptable if life expectancy rises to 1,400 years post-AGI.
Footnote 21 clarifies, ``more generally, we could take P(doom) to be the expected fraction of the human population that dies when superintelligence is launched.″
So, suppose one could extend life 1,400 years for 3% of humans at the cost of killing 97% right now. How should we reply to this deal? Bostrom’s go/no-go model says to accept, but I think people would overwhelmingly find the deal morally reprehensible.
It’s straightforward to observe that the two deals, Bostrom’s and mine, are mathematically identical for the go/no-go model, which assumes as person-affecting framework. But I think something more subtle happens if we step slightly outside the person-affecting framework.
If you have even slight impersonal preferences, I think that you should decide Bostrom’s deal is strictly worse than mine. Therefore, I think that you should support Bostrom’s deal only if (a) you solely follow selfish motivations (which most people think is bad) or (b) you support my deal (and my deal seems horrible).
Why is Bostrom’s deal strictly worse than mine, if you have slight impersonal preferences? Under Bostrom’s deal, in 97% of cases everyone is dead. Under my deal the surviving 3% could have further generations who continue aspects of life we value, potentially flourishing for a long time.
One could disagree and propose ways my deal seems worse than Bostrom’s, but I think my deal can simply be modified to accommodate until my deal is better. For example, if you feel its more fair for everyone to face the same risk, suppose the 3% are selected by random sampling in my deal. If you favor Bostrom’s deal because it doesn’t split families and friends across living vs. killed, suppose the sampling in my deal is by large clusters of families and friends. If you favor Bostrom’s scenario because of non-lifespan ASI benefits, suppose an ASI confers those same benefits to the surviving 3% in my deal. Wherever you end up, my deal is still killing 97% of people to benefit 3%, so my deal is still horrible. Yet, Bostrom’s deal remains worse.
I am sympathetic that Bostrom set out reasonably to investigate ASI within a person-affecting framework. Unfortunately—and irrespective of Bostrom’s initial intentions—I think that his article has mainly turned out to be a list of justifications for the small, probably malevolent group of people who would accept a deal that kills 97% of humans to extend the lives of 3%.
- ^
Always read the footnotes!
- ^
Bostrom’s results seem very sensitive to deviations from a wholly person-affecting perspective. To investigate, I coded up the model from Appendix A with one modification: I supposed that, instead of being wholly self interested, people are willing to sacrifice 10% of life expectancy for the sake of all future generations.
My method was to calculate the launch time that is later than the optimal time-point according to a selfish view, but only so much that life expectancy is reduced 10% from the selfish optimum.[^1] This method is crude, but illustrates how rush-to-launch loses support if one walks mildly away from a person-affecting view.
For example, with 20% $P_{doom}$ and 10%/yr safety progress, the selfishly optimal launch time is 8 months (Bostrom’s Table 3), which offers you 1,120 years of life expectancy. If you are willing to sacrifice 10% of that life expectancy (leaving you with 1,008 expected years of life!) for future generations, you would wait 11 years before launch to help safety become established. More generally, all superintelligence launch times from Table 3 were delayed by at least 4 years (none were ASAP anymore) and many were delayed 10-20 years. The rush to superintelligence was ameliorated.
Last, I estimated delays under more sacrifice. If people are willing to lose half of life expectancy to help ensure the existence of future generations, then superintelligence launch times would be delayed by at least 28 years for all scenarios covered in Bostrom’s Table 3. Results are below. Further, for all cases with $P_{doom}$ of 80% or less, the life expectancies of those making the sacrifice would remain generous, exceeding 140 years for $P_{doom}$ of 80%, exceeding 349 years for $P_{doom}$ of 50%, and exceeding 550 years for $P_{doom}$ of 20% or less.
Table A. ASI launch delay by P(doom) and safety progress, offering 50% of life expectancy. The table includes the scenarios from Bostrom’s Table 3.
Safety progress P(doom) 1% 5% 20% 50% 80% 95% 99% No progress
(0%/yr)*
29 y 29 y 29 y 30 y 35 y Never
launch
Never
launch
Glacial
(0.1%/yr)
29 y 29 y 30 y 32 y 42 y Never
launch
Never
launch
Very slow
(1%/yr)
29 y 30 y 32 y 43 y 80 y 102 y 108 y Moderate
(10%/yr)
29 y 31 y 38 y 47 y 52 y 54 y 54 y Brisk
(50%/yr)
29 y 31 y 33 y 34 y 35 y 35 y 35 y Very fast
(90%/yr)
29 y 30 y 31 y 31 y 31 y 31 y 31 y Ultra fast
(99%/yr)
29 y 30 y 30 y 30 y 30 y 30 y 30 y * Note that the sacrifice is fruitless in this case, because there is no safety progress during the delay. Similarly, the sacrifice may not have reasonable justification in the ultra-fast case.
[^1]: To confirm my code’s correctness, I also recreated Bostrom’s Table 3. This revealed a typo in Table 3: For $P_{doom} = 0.95$ and safety progress of 1%, the launch time is listed as 14.3 years but should be about 31.4.
Fair enough & I appreciate the follow-up.
Though I will say—It seems we need to find lessons somewhere in history, in part because we aren’t smart enough as a species to reason purely from first principles. I’m certainly not smart enough for that, anyway.
When looking for lessons on AI, nuclear development may be the least worst historical analogy.