I really wish I could agree. I think we should definitely think about flourishing when it’s a win/win with survival efforts. But saying we’re near the ceiling on survival looks wildly too optimistic to me. This is after very deeply considering our position and the best estimate of our odds, primarily surrounding the challenge of aligning superhuman AGI (including surrounding societal complications).
There are very reasonable arguments to be made about the best estimate of alignment/AGI risk. But disaster likelihoods below 10% really just aren’t viable when you look in detail. And it seems like that’s what you need to argue that we’re near ceiling on survival.
The core claim here is “we’re going to make a new species which is far smarter than we are, and that will definitely be fine because we’ll be really careful how we make it” in some combination with “oh we’re definitely not making a new species any time soon, just more helpful tools”.
When examined in detail, assigning a high confidence to those statements is just as silly as it looks at a glance. That is obviously a very dangerous thing and one we’ll do pretty much as soon as we’re able.
90% plus on survival looks like a rational view from a distance, but there are very strong arguments that it’s not. This won’t be a full presentation of those arguments; I haven’t written it up satisfactorily yet, so here’s the barest sketch.
Here’s the problem: The more people think seriously about this question, the more pessimistic they are.
And time-on-task is the single most important factor for success in every endeavor. It’s not a guarantee but it’s by far the most important factor. It dwarfs raw intelligence as a predictor of success in every domain (although the two are multiplicative).
The “expert forecasters” you cite don’t have nearly the time-on-task of thinking about the AGI alignment problem. Those who actually work in that area are very systematically more pessimistic the longer and more deeply we’ve thought about it. There’s not a perfect correlation, but it’s quite large.
This should be very concerning from an outside view.
This effect clearly goes both ways, but that only starts to explain the effect. Those who intuitively find AGI very dangerous are prone to go into the field. And they’ll be subject to confirmation bias. But if they were wrong, a substantial subset should be shifting away from that view after they’re exposed to every argument for optimism. This effect would be exaggerated by the correlation between rationalist culture and alignment thinking; valuing rationality provides resistance (but certainly not immunity!) to motivated reasoning/confirmation bias by aligning ones’ motivations with updating based on arguments and evidence.
I am an optimistic person, and I deeply want AGI to be safe. I would be overjoyed for a year if I somehow updated to only 10% chance of AGI disaster. It is only my correcting for my biases that keeps me looking hard enough at pessimistic arguments to believe them based on their compelling logic.
And everyone is affected by motivated reasoning, particularly the optimists. This is complex, but after doing my level best to correct for motivations, it looks to me like the bias effects have far more leeway to work when there’s less to push against. The more evidence and arguments are considered, the less bias takes hold. This is from the literature on motivated reasoning and confirmation bias, which was my primary research focus for a few years and a primary consideration for the last ten.
That would’ve been better as a post or a short form, and more polished. But there it is FWIW, a dashed-off version of an argument I’ve been mulling over for the past couple of years.
I’ll still help you aim for flourishing, since having an optimistic target is a good way to motivate people to think about the future.
Thanks. On the EA Forum Ben West points out the clarification: I’m not assuming x-risk is lower than 10%; in my illustrative example I suggest 20%. (My own views are a little lower than that, but not an OOM lower, especially given that this % is about locking into near-0 value futures, not just extinction.) I wasn’t meaning to be placing a lot of weight on the superforecasters’ estimates.
Actually, because I ultimately argue that Flourishing is only 10% or less, for this part of the argument to work (i.e., for Flourishing to be be greater in scale than Surviving), I only need x-risk this century to be less than 90%! (Though the argument gets a lot weaker the higher your p(doom)).
The point is that we’re closer to the top of the x-axis than we are to the top of the y-axis.
Thank you! I saw that comment and responded there. I said that really clarified the argument and that given that clarification, I largely agree.
My one caveat is noting that if we screw up alignment we could easily kill more than our own chance at flourishing. I think it’s pretty easy to get a paperclipper expanding at near-C and snuffing out all civilizations in our light cone before they get their chance to prosper. So raising our odds of flourishing should be weighed against the risk of messing up a bunch of other civilizations chances. One likely non-simulation answer to the Fermi paradox is that we’re early to the game. We shouldn’t lose big if it keeps others from getting to play.
I hadn’t considered this tradeoff closely because in my world models survival and flourishing are still closely tied together. If we solve alignment we probably get near-optimal flourishing. If we don’t, we all die.
I realize there’s a lot of room in between; that model is down to the way I think goals and alignment and human beings work. I think intent alignment is more likely, which would put (a) human(s) in charge of the future. I think most humans would agree that flourishing sounds nice if they had long enough to contemplate it. Very few people are so sociopathic/sadistic that they’d want to not allow flourishing in the very long term.
But that’s just one theory! It’s quite hard to guess and I wouldn’t want to assume that’s correct.
I’ll look in more depth at your ideas of how to play for a big win. I’m sure most of it is compatible with trying our best to survive.
What do you mean by solve alignment? What is your optimal world? What you consider “near-optimal flourishing” is likely very different than many other people’s ideas of near-optimal flourishing. I think people working on alignment are just punting on this issue right now while they figure out how to implement intent and value alignment but I assume there will be a lot of conflict about what values a model will be aligned to and who a model will be aligned to if/when we have the technical ability to align powerful AIs.
I agree that we should shift our focus from pure survival to prosperity. But I disagree with the dichotomy that the author seems to be proposing. Survival and prosperity are not mutually exclusive, because long-term prosperity is impossible with a high risk of extinction.
Perhaps a more productive formulation would be the following: “When choosing between two strategies, both of which increase the chances of survival, we should give priority to the one that may increase them slightly less, but at the same time provides a huge leap in the potential for prosperity.”
However, I believe that the strongest scenarios are those that eliminate the need for such a compromise altogether. These are options that simultaneously increase survival and ensure prosperity, creating a synergistic effect. It is precisely the search for such scenarios that we should focus on.
In fact, I am working on developing one such idea. It is a model of society that simultaneously reduces the risks associated with AI and destructive competition and provides meaning in a world of post-labor abundance, while remaining open and inclusive. This is precisely in the spirit of the “viatopias” that the author talks about.
If this idea of the synergy of survival and prosperity resonates with you, I would be happy to discuss it further.
Great work as always. I’m not sure if I agree that we should be focusing on flourishing, conditional on survival. I think a bigger risk would be risks of astronomical suffering which seem like almost the default outcome. Eg digital minds, wild animals in space colonization, and unknown-unknowns. It’s possible that the interventions would be overlapping but I am skeptical.
I also don’t love the citations for a low p(doom). Toby Ord’s guess was from 2020, the Super Forecaster survey from 2022 and prediction markets aren’t really optimized for this sort of question. Something like Eli Lifland’s guess or the AI Impacts Surveys are where I would start as a jumping off point.
I really wish I could agree. I think we should definitely think about flourishing when it’s a win/win with survival efforts. But saying we’re near the ceiling on survival looks wildly too optimistic to me. This is after very deeply considering our position and the best estimate of our odds, primarily surrounding the challenge of aligning superhuman AGI (including surrounding societal complications).
There are very reasonable arguments to be made about the best estimate of alignment/AGI risk. But disaster likelihoods below 10% really just aren’t viable when you look in detail. And it seems like that’s what you need to argue that we’re near ceiling on survival.
The core claim here is “we’re going to make a new species which is far smarter than we are, and that will definitely be fine because we’ll be really careful how we make it” in some combination with “oh we’re definitely not making a new species any time soon, just more helpful tools”.
When examined in detail, assigning a high confidence to those statements is just as silly as it looks at a glance. That is obviously a very dangerous thing and one we’ll do pretty much as soon as we’re able.
90% plus on survival looks like a rational view from a distance, but there are very strong arguments that it’s not. This won’t be a full presentation of those arguments; I haven’t written it up satisfactorily yet, so here’s the barest sketch.
Here’s the problem: The more people think seriously about this question, the more pessimistic they are.
And time-on-task is the single most important factor for success in every endeavor. It’s not a guarantee but it’s by far the most important factor. It dwarfs raw intelligence as a predictor of success in every domain (although the two are multiplicative).
The “expert forecasters” you cite don’t have nearly the time-on-task of thinking about the AGI alignment problem. Those who actually work in that area are very systematically more pessimistic the longer and more deeply we’ve thought about it. There’s not a perfect correlation, but it’s quite large.
This should be very concerning from an outside view.
This effect clearly goes both ways, but that only starts to explain the effect. Those who intuitively find AGI very dangerous are prone to go into the field. And they’ll be subject to confirmation bias. But if they were wrong, a substantial subset should be shifting away from that view after they’re exposed to every argument for optimism. This effect would be exaggerated by the correlation between rationalist culture and alignment thinking; valuing rationality provides resistance (but certainly not immunity!) to motivated reasoning/confirmation bias by aligning ones’ motivations with updating based on arguments and evidence.
I am an optimistic person, and I deeply want AGI to be safe. I would be overjoyed for a year if I somehow updated to only 10% chance of AGI disaster. It is only my correcting for my biases that keeps me looking hard enough at pessimistic arguments to believe them based on their compelling logic.
And everyone is affected by motivated reasoning, particularly the optimists. This is complex, but after doing my level best to correct for motivations, it looks to me like the bias effects have far more leeway to work when there’s less to push against. The more evidence and arguments are considered, the less bias takes hold. This is from the literature on motivated reasoning and confirmation bias, which was my primary research focus for a few years and a primary consideration for the last ten.
That would’ve been better as a post or a short form, and more polished. But there it is FWIW, a dashed-off version of an argument I’ve been mulling over for the past couple of years.
I’ll still help you aim for flourishing, since having an optimistic target is a good way to motivate people to think about the future.
Thanks. On the EA Forum Ben West points out the clarification: I’m not assuming x-risk is lower than 10%; in my illustrative example I suggest 20%. (My own views are a little lower than that, but not an OOM lower, especially given that this % is about locking into near-0 value futures, not just extinction.) I wasn’t meaning to be placing a lot of weight on the superforecasters’ estimates.
Actually, because I ultimately argue that Flourishing is only 10% or less, for this part of the argument to work (i.e., for Flourishing to be be greater in scale than Surviving), I only need x-risk this century to be less than 90%! (Though the argument gets a lot weaker the higher your p(doom)).
The point is that we’re closer to the top of the x-axis than we are to the top of the y-axis.
Thank you! I saw that comment and responded there. I said that really clarified the argument and that given that clarification, I largely agree.
My one caveat is noting that if we screw up alignment we could easily kill more than our own chance at flourishing. I think it’s pretty easy to get a paperclipper expanding at near-C and snuffing out all civilizations in our light cone before they get their chance to prosper. So raising our odds of flourishing should be weighed against the risk of messing up a bunch of other civilizations chances. One likely non-simulation answer to the Fermi paradox is that we’re early to the game. We shouldn’t lose big if it keeps others from getting to play.
I hadn’t considered this tradeoff closely because in my world models survival and flourishing are still closely tied together. If we solve alignment we probably get near-optimal flourishing. If we don’t, we all die.
I realize there’s a lot of room in between; that model is down to the way I think goals and alignment and human beings work. I think intent alignment is more likely, which would put (a) human(s) in charge of the future. I think most humans would agree that flourishing sounds nice if they had long enough to contemplate it. Very few people are so sociopathic/sadistic that they’d want to not allow flourishing in the very long term.
But that’s just one theory! It’s quite hard to guess and I wouldn’t want to assume that’s correct.
I’ll look in more depth at your ideas of how to play for a big win. I’m sure most of it is compatible with trying our best to survive.
What do you mean by solve alignment? What is your optimal world? What you consider “near-optimal flourishing” is likely very different than many other people’s ideas of near-optimal flourishing. I think people working on alignment are just punting on this issue right now while they figure out how to implement intent and value alignment but I assume there will be a lot of conflict about what values a model will be aligned to and who a model will be aligned to if/when we have the technical ability to align powerful AIs.
Does the above chart assume all survival situations are better than non-survival? Because that is a DANGEROUS assumption to make.
I agree that we should shift our focus from pure survival to prosperity. But I disagree with the dichotomy that the author seems to be proposing. Survival and prosperity are not mutually exclusive, because long-term prosperity is impossible with a high risk of extinction.
Perhaps a more productive formulation would be the following: “When choosing between two strategies, both of which increase the chances of survival, we should give priority to the one that may increase them slightly less, but at the same time provides a huge leap in the potential for prosperity.”
However, I believe that the strongest scenarios are those that eliminate the need for such a compromise altogether. These are options that simultaneously increase survival and ensure prosperity, creating a synergistic effect. It is precisely the search for such scenarios that we should focus on.
In fact, I am working on developing one such idea. It is a model of society that simultaneously reduces the risks associated with AI and destructive competition and provides meaning in a world of post-labor abundance, while remaining open and inclusive. This is precisely in the spirit of the “viatopias” that the author talks about.
If this idea of the synergy of survival and prosperity resonates with you, I would be happy to discuss it further.
i like the name viatopia , but perhaps search for a clear and simple name, like opentopia.
Great work as always. I’m not sure if I agree that we should be focusing on flourishing, conditional on survival. I think a bigger risk would be risks of astronomical suffering which seem like almost the default outcome. Eg digital minds, wild animals in space colonization, and unknown-unknowns. It’s possible that the interventions would be overlapping but I am skeptical.
I also don’t love the citations for a low p(doom). Toby Ord’s guess was from 2020, the Super Forecaster survey from 2022 and prediction markets aren’t really optimized for this sort of question. Something like Eli Lifland’s guess or the AI Impacts Surveys are where I would start as a jumping off point.