I agree with your logic that shutdown is easier than controlled takeoff, but I think controlled takeoff is much more viable.
I see three major blockers to full shutdown:
Many experts currently think alignment isn’t incredibly hard
Most current humans aren’t utilitarian nor longtermist
Not dying personally is a large incentive
Winner controls the future
Given all of that, I’d expect covert or overt government projects to continue even if we get a treaty banning private research. (Which I do find quite possible; I expect scary demos to happen naturally).
There’s one set of experts saying alignment is almost impossible, and another group of experts saying it’s probably do-able as long as we aren’t dumb about it. That’s not a rational reason for a shutdown if you’re not longtermist. (edit: - and older, like most decision-makers, so shutdown probably means you personally die).
Full shutdown, or surviving without it, seem to share a critical-path component. Conceptually clarifying and communicating about alignment difficulty on the current path seem necessary for either route. That’s what it would take to shift expert opinions enough to get a full shutdown, and it would help the odds of solving alignment in time if we don’t get a full shutdown. So I’d love to see more effort in that direction (like we got with your The Title is Reasonable post!)
I really hope I’m missing something! I’ll continue advocating for shutdown as a means to any viable slowdown, but I’d love to have some genuine hope for it!
The only way I can see to prevent some programs continuiing is if experts were to unify behind “alignment is highly ulikely to succeed using current methods”. If everyone with a clue said “we’ll probably die if we try it any time soon”, that might be enough to dissuade selfish decision-makers. This would require some amazing communication and clarification. If we take Paul Christiano as representative of expert optimists, he had p(doom) from misalignment at around 20% (and another ~20% of other catastrophes following soon… but those seem separable). Those are decent odds if you only care about yourself and your loved ones.
Lots of people would jump at the chance to gamble the entire future against their own immortality on those odds. If that’s representative of people in the labs, I don’t see how we prevent those in power from gambling the future.
And people like Christiano and Shah clearly do understand the problem. So shifting their odds dramatically seems like it would take some breakthrough in conceptual clarification of the problem, or communication, or more likely, both.
The improvements in undestanding and communicating the difficulty of the alignment problem seem like critical-path for a global shutdown or even slowdown, and also for attempting alignment in worlds where those don’t happen. That’s why my efforts are going there.
(Given the logic that shutdown is highly unlikely, my best-case hope is that the international treaty agree that a very few teams, probably one US and one Chinese, proceed, while all others are banned (enforcement would happen like any other treaty). Such projects would ideally be public, and communicate with each other on alignment risks and achievements. The idea would be that everyone agrees to roughly split the enormously growing pie, and to share generously with the rest of the world.
Given that logic, it seems inevitable to me that at least one or two projects push ahead fairly quickly, even in a best-case scenario. That’s why my efforts focus on that scenario, where we try alignment on roughly the current path.
Having raised this question, let me state clearly: I think we should shut it all down. I state this publicly and will continue to do so. I think alignment is probably hard and the odds of achieving it under race conditions are not good. If we get a substantial slowdown that probably means I’ll personally die, and I’d take that trade to improve humanity’s odds. But I’m probably more utilitarian and longtermist than 99% of humanity.
That’s not a rational reason for a shutdown if you’re not longtermist. (edit: - and older, like most decision-makers, so shutdown probably means you personally die).
This reads as if ‘longtermism’ and ‘not caring at all about future generations or people who would outlive you’ are the only possibilities.
Those are decent odds if you only care about yourself and your loved ones.
This assumes none of your loved ones are younger than you.
If someone believes a pause would meaningfully reduce extinction risk but also reduce their chance of personal immortality, they don’t have to be a ‘longtermist’ (or utilitarian, altruist, scope-insensitive, etc) to prefer to pause, just care enough about some posterity.
(This isn’t a claim about whether decision-makers do or don’t have the preferences you’re ascribing. I’m saying the dichotomy between those preferences and ‘longtermism’ is false, and also (like Haiku’s sibling comment) I don’t think they describe most humans even though ‘longtermism’ doesn’t either, and this is important.)
My mental model does include those gradients even though I expressed them as categories.
I currently think it’s all too likely that decision-makers would accept a 20% or more chance of extinction in exchange for the benefits.
One route is to make better guesses about what happens by default. The other is to try to create better decisions by spreading the relevant logic.
Those who want to gamble will certainly push the “it should be fine and we need to do it!” logic. The two sets of beliefs will probably develop symbiotically. It’s hard to separate emotional from rational reasons for beliefs.
Based on that logic, I actually think human cognitive biases and cognitive limitations are the biggest challenge to surviving ASI. We’re silly creatures with a spark of reason.
I think there are just very few people for whom this is a compelling argument. I don’t think government are coming anywhere close to explicitly making this calculation. I think some people in labs are maybe making this decision but they aren’t actually the target audience for this.
I agree that governments aren’t coming anywhere close to making this calculation at this point. I mean they very well might once they’ve actually thought about the issue. I think it will depend a lot on their collective distribution of p(doom). I’d think they’d push ahead if they could convince themselves it was lower than maybe 20% or thereabouts. I’d love to be convinced they would be more cautious.
Of course I think it very much depends on who is in the relevant governments at that time. I think that the issue could play a large role in elections, and that might help a lot.
Lots of people would jump at the chance to gamble the entire future against their own immortality on those odds.
This would assume that those people are also convinced that something like radical life extension is possible in principle, and that more advanced AI would be required for delivering it.
I have no idea for how many people that is true. Many people dismiss suggestions of radical life extension with the same reflexive “that’s sci-fi” reflex that AI x-risk scenarios get. Even if they became convinced of AI, life extension might stay in that category.
And if they did get convinced of its possibility, the most likely scenario I could see would be if advances in more narrow AI had already delivered proofs of concept. You could imagine it being solved by just something like extensive biological modeling tools that were more developed than what we have today, but did not yet cross the threshold to transformative AI.
It seems to me that believing ASI can kill you and believing ASI can save you are both pretty directly downstream of believing in ASI at all. Since the premise is that everyone believes pretty strongly in the possibility of doom, it seems they’d mostly get there by believing in ASI and would mostly also believe in the upside potentials too.
Yes. But because we’re discussing a scenario in which the world is ready to slow down or shut down AGI research, I’m assuming those steps have been crossed.
The biggest step IMO, “alignment is hard” doesn’t intervene between taking ASI seriously and thinking it could prevent you from dying of natural causes.
Thank you for your high-quality engagement on this and for including the clear statement!
I think my most substantial disagreement with you on the difficulty of a shutdown is related to longtermism. Most normal people would not take a 5% risk of destroying the world in order to greatly improve their lives and the lives of their children. That isn’t because they are longtermist, but primarily because they are simply horrified by the concept of destroying the world.
It is in fact almost entirely utilitarians who are in favor of taking that risk, because they are able to justify it to themselves after doing some simplified calculation. Ordinary people, rational or irrational, who just want good things for themselves and their kids usually don’t want to risk their own lives, certainly don’t want to risk their kids’ lives, and it wouldn’t cross their mind to risk other people’s kids’ lives, when put in stark terms.
“Human civilization should not be made to collapse in the next few decades” and “humanity should survive for a good long while” are longtermist positions, but they are also what >90% of people in every nation on earth already believe.
Most normal people would not take a 5% risk of destroying the world in order to greatly improve their lives and the lives of their children.
Polls suggest that most normal people expect AGI to be bad for them and they don’t want it. I’m more speculating here, but I think the typical expectation is something like “AGI will put me out of a job; billionaires will get even richer and I’ll get nothing.”
This isn’t terribly decision-relevant except for deciding what type of alignment work to do. But that does seem nontrival. My bottom line is: push for pause/slowdown, but don’t get overoptimistic. Simultaneously work toward alignment on the current path, as fast as possible, because that might well be our only chance.
To your point:
I take your point on the standard reasoning. I agree that most adults would turn down even a 95-5%, 19 vs 1 odds of improving their lives with a small chance of destroying the world.
But I’m afraid those with decision-making power would take far worse odds in private, where it matters. That’s because for them, it’s not just a better life; it’s immortality vs. dying soon. And that tends to change decision-making.
I added and marked an edit to make this part of the logic explicit.
Most humans with decision-making power, e.g. in goverment, are 50+ years old, and mostly older since power tends to accumulate at least until sharp cognitive declines set in. There’s a pretty good chance they will die of natural causes if ASI isn’t created to do groundbreaking medical research within their lifetimes.
That’s on top of any actual nationalistic tendencies, or fears of being killed, enslaved, tortured, or worse, mocked, by losing the race to one’s political enemies covertly pursuing ASI.
And that’s on top of worrying that sociopaths (or similarly cold/selfish decision-makers) are over-represented in the halls of power. Those arguments seem pretty strong to me, too.
How this would unfold is highly unclear to me. I think it’s important to develop gears-level models of how these processes might happen, as Raemon suggests in this post.
My guess is that covert programs are between likely and inevitable. Public pressure will be for caution; private and powerful opinions will be much harder to predict.
As for the public statements, it works just fine to say, or more likely convince yourself, that you think aligment is solvable with little chance of failure, and that patriotism and horror over (insert enemy ideology here) controlling the future are your motivations.
Given all of that, I’d expect covert or overt government projects to continue even if we get a treaty banning private research. (Which I do find quite possible; I expect scary demos to happen naturally).
Why is this special to Shutdown, vs Controlled Takeoff? (Here, I’m specifically comparing two plans that both route through “first, do a pretty difficult political action of get countries agreeing to centralize GPUs”). If you just expect people to defect from that, what’s the point?
The scenario I’m thinking of mostly is the overt version, which is controlled takeoff. A few governments pursue controlled takeoff projects. This is hopefully done in the open and relatively collaboratively across US and Chinese teams. I’d assume they’d recruit from existing teams, effectively consolidating labs under government control. I’d hope that the Western and Chinese teams would agree to share their results on capabilities as well as alignment, although of course they’d worry about defection on that, too.
If they did it covertly that would be defecting. I haven’t thought about that scenario as much.
This scenario doesn’t seem like it would require consolidating GPUs, just monitoring their usage to some degree. It seems like it would be a lot easier to not make that part of the treaty.
The post is specifically about “globally controlled takeoff”, in which multiple governments have agreed to locate their GPUs in locations that are easy for each other to inspect.
There’s a spectrum between “Literally all countries agree to consolidate and monitor compute”, “US/UK/China do it”, “US/UK/Europe agree to do it among themselves”, “US does it just for itself” and “individual orgs are just being idk a bit careful and a bit cooperative in an ad-hoc fashion.”
I call the latter end of the spectrum “ad hoc semi-controlled semi-slowed takeof” at the beginning of the post. If we get something somewhere in the middle, seems probably an improvement.
I thought I was addressing the premise of your post: the world is ready to do serious restrictions on AI research: do they do shutdown or controlled takeoff?
I guess maybe I’m missing what’s important about the physical consolidation vs other methods of inspection and enforcement.
I think my scenario conforms to all of the gears you mention. It could be seen as adding another gear: the incentives/psychologies of government decision-makers.
Thanks for writing this.
I agree with your logic that shutdown is easier than controlled takeoff, but I think controlled takeoff is much more viable.
I see three major blockers to full shutdown:
Many experts currently think alignment isn’t incredibly hard
Most current humans aren’t utilitarian nor longtermist
Not dying personally is a large incentive
Winner controls the future
Given all of that, I’d expect covert or overt government projects to continue even if we get a treaty banning private research. (Which I do find quite possible; I expect scary demos to happen naturally).
There’s one set of experts saying alignment is almost impossible, and another group of experts saying it’s probably do-able as long as we aren’t dumb about it. That’s not a rational reason for a shutdown if you’re not longtermist. (edit: - and older, like most decision-makers, so shutdown probably means you personally die).
Full shutdown, or surviving without it, seem to share a critical-path component. Conceptually clarifying and communicating about alignment difficulty on the current path seem necessary for either route. That’s what it would take to shift expert opinions enough to get a full shutdown, and it would help the odds of solving alignment in time if we don’t get a full shutdown. So I’d love to see more effort in that direction (like we got with your The Title is Reasonable post!)
I really hope I’m missing something! I’ll continue advocating for shutdown as a means to any viable slowdown, but I’d love to have some genuine hope for it!
The only way I can see to prevent some programs continuiing is if experts were to unify behind “alignment is highly ulikely to succeed using current methods”. If everyone with a clue said “we’ll probably die if we try it any time soon”, that might be enough to dissuade selfish decision-makers. This would require some amazing communication and clarification. If we take Paul Christiano as representative of expert optimists, he had p(doom) from misalignment at around 20% (and another ~20% of other catastrophes following soon… but those seem separable). Those are decent odds if you only care about yourself and your loved ones.
Lots of people would jump at the chance to gamble the entire future against their own immortality on those odds. If that’s representative of people in the labs, I don’t see how we prevent those in power from gambling the future.
And people like Christiano and Shah clearly do understand the problem. So shifting their odds dramatically seems like it would take some breakthrough in conceptual clarification of the problem, or communication, or more likely, both.
The improvements in undestanding and communicating the difficulty of the alignment problem seem like critical-path for a global shutdown or even slowdown, and also for attempting alignment in worlds where those don’t happen. That’s why my efforts are going there.
(Given the logic that shutdown is highly unlikely, my best-case hope is that the international treaty agree that a very few teams, probably one US and one Chinese, proceed, while all others are banned (enforcement would happen like any other treaty). Such projects would ideally be public, and communicate with each other on alignment risks and achievements. The idea would be that everyone agrees to roughly split the enormously growing pie, and to share generously with the rest of the world.
Given that logic, it seems inevitable to me that at least one or two projects push ahead fairly quickly, even in a best-case scenario. That’s why my efforts focus on that scenario, where we try alignment on roughly the current path.
Having raised this question, let me state clearly: I think we should shut it all down. I state this publicly and will continue to do so. I think alignment is probably hard and the odds of achieving it under race conditions are not good. If we get a substantial slowdown that probably means I’ll personally die, and I’d take that trade to improve humanity’s odds. But I’m probably more utilitarian and longtermist than 99% of humanity.
So: Am I missing something?
(edited for clarity just after first posting)
This reads as if ‘longtermism’ and ‘not caring at all about future generations or people who would outlive you’ are the only possibilities.
This assumes none of your loved ones are younger than you.
If someone believes a pause would meaningfully reduce extinction risk but also reduce their chance of personal immortality, they don’t have to be a ‘longtermist’ (or utilitarian, altruist, scope-insensitive, etc) to prefer to pause, just care enough about some posterity.
(This isn’t a claim about whether decision-makers do or don’t have the preferences you’re ascribing. I’m saying the dichotomy between those preferences and ‘longtermism’ is false, and also (like Haiku’s sibling comment) I don’t think they describe most humans even though ‘longtermism’ doesn’t either, and this is important.)
Good points; I agree with all of them.
It’s hard to know how to weigh them.
My mental model does include those gradients even though I expressed them as categories.
I currently think it’s all too likely that decision-makers would accept a 20% or more chance of extinction in exchange for the benefits.
One route is to make better guesses about what happens by default. The other is to try to create better decisions by spreading the relevant logic.
Those who want to gamble will certainly push the “it should be fine and we need to do it!” logic. The two sets of beliefs will probably develop symbiotically. It’s hard to separate emotional from rational reasons for beliefs.
It looks to me like people automatically convince themselves that what they want to do emotionally is also the logical thing to do. See Motivated reasoning/confirmation bias as the most important cognitive bias for a brief discussion.
Based on that logic, I actually think human cognitive biases and cognitive limitations are the biggest challenge to surviving ASI. We’re silly creatures with a spark of reason.
I think there are just very few people for whom this is a compelling argument. I don’t think government are coming anywhere close to explicitly making this calculation. I think some people in labs are maybe making this decision but they aren’t actually the target audience for this.
I agree that governments aren’t coming anywhere close to making this calculation at this point. I mean they very well might once they’ve actually thought about the issue. I think it will depend a lot on their collective distribution of p(doom). I’d think they’d push ahead if they could convince themselves it was lower than maybe 20% or thereabouts. I’d love to be convinced they would be more cautious.
Of course I think it very much depends on who is in the relevant governments at that time. I think that the issue could play a large role in elections, and that might help a lot.
This would assume that those people are also convinced that something like radical life extension is possible in principle, and that more advanced AI would be required for delivering it.
I have no idea for how many people that is true. Many people dismiss suggestions of radical life extension with the same reflexive “that’s sci-fi” reflex that AI x-risk scenarios get. Even if they became convinced of AI, life extension might stay in that category.
And if they did get convinced of its possibility, the most likely scenario I could see would be if advances in more narrow AI had already delivered proofs of concept. You could imagine it being solved by just something like extensive biological modeling tools that were more developed than what we have today, but did not yet cross the threshold to transformative AI.
It seems to me that believing ASI can kill you and believing ASI can save you are both pretty directly downstream of believing in ASI at all. Since the premise is that everyone believes pretty strongly in the possibility of doom, it seems they’d mostly get there by believing in ASI and would mostly also believe in the upside potentials too.
There are several intermediate steps in the argument from ASI to doom.
Yes. But because we’re discussing a scenario in which the world is ready to slow down or shut down AGI research, I’m assuming those steps have been crossed.
The biggest step IMO, “alignment is hard” doesn’t intervene between taking ASI seriously and thinking it could prevent you from dying of natural causes.
Thank you for your high-quality engagement on this and for including the clear statement!
I think my most substantial disagreement with you on the difficulty of a shutdown is related to longtermism. Most normal people would not take a 5% risk of destroying the world in order to greatly improve their lives and the lives of their children. That isn’t because they are longtermist, but primarily because they are simply horrified by the concept of destroying the world.
It is in fact almost entirely utilitarians who are in favor of taking that risk, because they are able to justify it to themselves after doing some simplified calculation. Ordinary people, rational or irrational, who just want good things for themselves and their kids usually don’t want to risk their own lives, certainly don’t want to risk their kids’ lives, and it wouldn’t cross their mind to risk other people’s kids’ lives, when put in stark terms.
“Human civilization should not be made to collapse in the next few decades” and “humanity should survive for a good long while” are longtermist positions, but they are also what >90% of people in every nation on earth already believe.
Polls suggest that most normal people expect AGI to be bad for them and they don’t want it. I’m more speculating here, but I think the typical expectation is something like “AGI will put me out of a job; billionaires will get even richer and I’ll get nothing.”
This isn’t terribly decision-relevant except for deciding what type of alignment work to do. But that does seem nontrival. My bottom line is: push for pause/slowdown, but don’t get overoptimistic. Simultaneously work toward alignment on the current path, as fast as possible, because that might well be our only chance.
To your point:
I take your point on the standard reasoning. I agree that most adults would turn down even a 95-5%, 19 vs 1 odds of improving their lives with a small chance of destroying the world.
But I’m afraid those with decision-making power would take far worse odds in private, where it matters. That’s because for them, it’s not just a better life; it’s immortality vs. dying soon. And that tends to change decision-making.
I added and marked an edit to make this part of the logic explicit.
Most humans with decision-making power, e.g. in goverment, are 50+ years old, and mostly older since power tends to accumulate at least until sharp cognitive declines set in. There’s a pretty good chance they will die of natural causes if ASI isn’t created to do groundbreaking medical research within their lifetimes.
That’s on top of any actual nationalistic tendencies, or fears of being killed, enslaved, tortured, or worse, mocked, by losing the race to one’s political enemies covertly pursuing ASI.
And that’s on top of worrying that sociopaths (or similarly cold/selfish decision-makers) are over-represented in the halls of power. Those arguments seem pretty strong to me, too.
How this would unfold is highly unclear to me. I think it’s important to develop gears-level models of how these processes might happen, as Raemon suggests in this post.
My guess is that covert programs are between likely and inevitable. Public pressure will be for caution; private and powerful opinions will be much harder to predict.
As for the public statements, it works just fine to say, or more likely convince yourself, that you think aligment is solvable with little chance of failure, and that patriotism and horror over (insert enemy ideology here) controlling the future are your motivations.
Why is this special to Shutdown, vs Controlled Takeoff? (Here, I’m specifically comparing two plans that both route through “first, do a pretty difficult political action of get countries agreeing to centralize GPUs”). If you just expect people to defect from that, what’s the point?
The scenario I’m thinking of mostly is the overt version, which is controlled takeoff. A few governments pursue controlled takeoff projects. This is hopefully done in the open and relatively collaboratively across US and Chinese teams. I’d assume they’d recruit from existing teams, effectively consolidating labs under government control. I’d hope that the Western and Chinese teams would agree to share their results on capabilities as well as alignment, although of course they’d worry about defection on that, too.
If they did it covertly that would be defecting. I haven’t thought about that scenario as much.
This scenario doesn’t seem like it would require consolidating GPUs, just monitoring their usage to some degree. It seems like it would be a lot easier to not make that part of the treaty.
The post is specifically about “globally controlled takeoff”, in which multiple governments have agreed to locate their GPUs in locations that are easy for each other to inspect.
There’s a spectrum between “Literally all countries agree to consolidate and monitor compute”, “US/UK/China do it”, “US/UK/Europe agree to do it among themselves”, “US does it just for itself” and “individual orgs are just being idk a bit careful and a bit cooperative in an ad-hoc fashion.”
I call the latter end of the spectrum “ad hoc semi-controlled semi-slowed takeof” at the beginning of the post. If we get something somewhere in the middle, seems probably an improvement.
I thought I was addressing the premise of your post: the world is ready to do serious restrictions on AI research: do they do shutdown or controlled takeoff?
I guess maybe I’m missing what’s important about the physical consolidation vs other methods of inspection and enforcement.
I think my scenario conforms to all of the gears you mention. It could be seen as adding another gear: the incentives/psychologies of government decision-makers.