Getting the AI industry to take AI risk seriously is a necessary and sufficient condition for survival.
I’m going to play devil’s advocate against this claim.
First: the AI industry taking AI risk seriously is not obviously a sufficient condition for survival. In the long run, the hard technical problems would still have to be solved in order to safely use AI. And there would still be a timer before someone built an unsafe AI: FLOPs would presumably still keep getting cheaper, and publicly-available algorithms and insights would keep accumulating (even if somewhat less quickly). Even with the whole AI industry on board, sooner or later some hacker would build an unsafe AI in their basement.
Getting the whole AI industry on board would buy time. It would not, in itself, be a win condition.
Second: getting the AI industry to take AI risk seriously is not obviously a necessary condition. It is necessary that people working on alignment have a capabilities lead. However, as you mention in the post:
Moreover, I don’t think alignment and capabilities are orthogonal. I think they’re very much positively correlated.
It is true that today’s alignment researchers do not have any significant capabilities edge (or at least aren’t showing it). But today’s alignment researchers are also not even close to solving the alignment problem. I expect that an alignment research group which was able to solve the hard parts of alignment would also be far ahead of the mainstream on capabilities, because the two are so strongly correlated. I very much doubt that one could figure out how to robustly align general intelligence without also figuring out how to build it efficiently.
Strong positive correlation between alignment and capabilities research problems mean that non-alignment researchers win the capabilities race mainly in worlds where the alignment researchers aren’t able to solve the alignment problem anyway.
Getting the whole AI industry on board would buy time. It would not, in itself, be a win condition.
Mm, I don’t think we’re disagreeing here, I just played fast and loose with definitions. Statement: “If we get the AI industry to take AI Safety seriously, it’s a sufficient condition for survival.”
If “we” = “humanity”, then yes, there’ll still be the work of actually figuring out alignment left to do.
I had “we” = “the extant AI Safety community”, in the sense that if the AI industry is moved to that desirable state, we could (in theory) just sit on our hands and expect others to solve alignment “on their own”.
I expect that an alignment research group which was able to solve the hard parts of alignment would also be far ahead of the mainstream on capabilities, because the two are so strongly correlated
But isn’t that a one-way relationship? Progressing alignment progresses capabilities, but progressing capabilities doesn’t necessarily strongly progress alignment (otherwise there’d be no problem to begin with). And I guess I still expect that alignment-orthogonal research would progress capabilities faster. (Or, at least, that it’d be faster up to some point. Past that point alignment research might become necessary for further progress… But that point is not necessarily above the level of capabilities that kills everyone.)
Eliezer did define “pivotal act” so as to be necessary. It’s an act which makes it so that nobody will build an unaligned AI; that’s pretty straightforwardly necessary for preventing existential risk, assuming that unaligned AI poses an existential risk in the first place.
However, the danger in introducing concepts via definitions is that there may be “pivotal acts” which satisfy the definition but do not match the prototypical picture of a “pivotal act”.
Yeah, I guess the answer is yes by definition. Still wondering what kind of pivotal acts people are thinking about—whether they’re closer to a big power-grabs like “burn all the GPUs”, or softer governance methods like “publishing papers with alignment techniques” and “encouraging safe development with industry groups and policy standards”. And whether the need for a pivotal act is the main reason why alignment researchers need to be on the cutting edge of capabilities.
I can’t see how “publishing papers with alignment techniques” or “encouraging safe development with industry groups and policy standards” could be pivotal acts. To prevent anyone from building unaligned AI, building an unaligned AI in your garage needs to be prevented. That requires preventing people who don’t read the alignment papers or policy standards and aren’t members of the industry groups from building unaligned AI.
That, in turn, appears to me to require at least one of 1) limiting access to computation resources from your garage, 2) limiting knowledge by garage hackers of techniques to build unaligned AI, 3) somehow convincing all garage hackers not to build unaligned AI even though they could, or 4) surveillance and intervention to prevent anyone from actually building an unaligned AI even though they have the computation resources and knowledge to do it. Surveillance, under option 4, could (theoretically, I’m not saying all of these possibilities are practical) be by humans, by too-weak-to-be-dangerous AI, or by aligned AI.
“Publishing papers with alignment techniques” and “encouraging safe development with industry groups and policy standards” might well be useful actions. It doesn’t seem to me that anything like that can ever be pivotal. Building an actual aligned AI, of course, would be a pivotal act.
“Building an actual aligned AI, of course, would be a pivotal act.” What would an aligned AI do that would prevent anybody from ever building an unaligned AI?
I mostly agree with what you wrote. Preventing all unaligned AIs forever seems very difficult and cannot be guaranteed by soft influence and governance methods. These would only achieve a lower degree of reliability, perhaps constraining governments and corporations via access to compute and critical algorithms but remaining susceptible to bad actors who find loopholes in the system. I guess what I’m poking at is, does everyone here believe that the only way to prevent AI catastrophe is through power-grab pivotal acts that are way outside the Overton Window, e.g. burning all GPUs?
“Building an actual aligned AI, of course, would be a pivotal act.” What would an aligned AI do that would prevent anybody from ever building an unaligned AI?
My guess is that it would implement universal surveillance and intervene, when necessary, to directly stop people from doing just that. Sorry, I should’ve been clearer that I was talking about an aligned superintelligent AI. Since unaligned AI killing everyone seems pretty obviously extremely bad according to the vast majority of humans’ preferences, preventing that would be a very high priority for any sufficiently powerful aligned AI.
Thanks, that really clarifies things. Frankly I’m not on board with any plan to “save the world” that calls for developing AGI in order to implement universal surveillance or otherwise take over the world. Global totalitarianism dictated by a small group of all-powerful individuals is just so terrible in expectation that I’d want to take my chances on other paths to AI safety.
I’m surprised that these kinds of pivotal acts are not more openly debated as a source of s-risk and x-risk. Publish your plans, open yourselves to critique, and perhaps you’ll revise your goals. If not, you’ll still be in a position to follow your original plan. Better yet, you might convince the eventual decision makers of it.
“It is necessary that people working on alignment have a capabilities lead.” Could you say a little more about this? Seems true but I’d be curious about your line of thought.
The theory of change could be as simple as “once we know how to build aligned AGI, we’ll tell everybody”, or as radical as “once we have an aligned AGI, we can steer the course of human events to prevent future catastrophe”. The more boring argument would be that any good ML research happens on the cutting edge of the field, so alignment needs big budgets and fancy labs just like any other researcher. Would you take a specific stance on which is most important?
I’m going to play devil’s advocate against this claim.
First: the AI industry taking AI risk seriously is not obviously a sufficient condition for survival. In the long run, the hard technical problems would still have to be solved in order to safely use AI. And there would still be a timer before someone built an unsafe AI: FLOPs would presumably still keep getting cheaper, and publicly-available algorithms and insights would keep accumulating (even if somewhat less quickly). Even with the whole AI industry on board, sooner or later some hacker would build an unsafe AI in their basement.
Getting the whole AI industry on board would buy time. It would not, in itself, be a win condition.
Second: getting the AI industry to take AI risk seriously is not obviously a necessary condition. It is necessary that people working on alignment have a capabilities lead. However, as you mention in the post:
It is true that today’s alignment researchers do not have any significant capabilities edge (or at least aren’t showing it). But today’s alignment researchers are also not even close to solving the alignment problem. I expect that an alignment research group which was able to solve the hard parts of alignment would also be far ahead of the mainstream on capabilities, because the two are so strongly correlated. I very much doubt that one could figure out how to robustly align general intelligence without also figuring out how to build it efficiently.
Strong positive correlation between alignment and capabilities research problems mean that non-alignment researchers win the capabilities race mainly in worlds where the alignment researchers aren’t able to solve the alignment problem anyway.
Mm, I don’t think we’re disagreeing here, I just played fast and loose with definitions. Statement: “If we get the AI industry to take AI Safety seriously, it’s a sufficient condition for survival.”
If “we” = “humanity”, then yes, there’ll still be the work of actually figuring out alignment left to do.
I had “we” = “the extant AI Safety community”, in the sense that if the AI industry is moved to that desirable state, we could (in theory) just sit on our hands and expect others to solve alignment “on their own”.
But isn’t that a one-way relationship? Progressing alignment progresses capabilities, but progressing capabilities doesn’t necessarily strongly progress alignment (otherwise there’d be no problem to begin with). And I guess I still expect that alignment-orthogonal research would progress capabilities faster. (Or, at least, that it’d be faster up to some point. Past that point alignment research might become necessary for further progress… But that point is not necessarily above the level of capabilities that kills everyone.)
Specifically, do you agree with Eliezer that preventing existential risks requires a “pivotal act” as described here (#6 and #7)?
Eliezer did define “pivotal act” so as to be necessary. It’s an act which makes it so that nobody will build an unaligned AI; that’s pretty straightforwardly necessary for preventing existential risk, assuming that unaligned AI poses an existential risk in the first place.
However, the danger in introducing concepts via definitions is that there may be “pivotal acts” which satisfy the definition but do not match the prototypical picture of a “pivotal act”.
Yeah, I guess the answer is yes by definition. Still wondering what kind of pivotal acts people are thinking about—whether they’re closer to a big power-grabs like “burn all the GPUs”, or softer governance methods like “publishing papers with alignment techniques” and “encouraging safe development with industry groups and policy standards”. And whether the need for a pivotal act is the main reason why alignment researchers need to be on the cutting edge of capabilities.
I can’t see how “publishing papers with alignment techniques” or “encouraging safe development with industry groups and policy standards” could be pivotal acts. To prevent anyone from building unaligned AI, building an unaligned AI in your garage needs to be prevented. That requires preventing people who don’t read the alignment papers or policy standards and aren’t members of the industry groups from building unaligned AI.
That, in turn, appears to me to require at least one of 1) limiting access to computation resources from your garage, 2) limiting knowledge by garage hackers of techniques to build unaligned AI, 3) somehow convincing all garage hackers not to build unaligned AI even though they could, or 4) surveillance and intervention to prevent anyone from actually building an unaligned AI even though they have the computation resources and knowledge to do it. Surveillance, under option 4, could (theoretically, I’m not saying all of these possibilities are practical) be by humans, by too-weak-to-be-dangerous AI, or by aligned AI.
“Publishing papers with alignment techniques” and “encouraging safe development with industry groups and policy standards” might well be useful actions. It doesn’t seem to me that anything like that can ever be pivotal. Building an actual aligned AI, of course, would be a pivotal act.
“Building an actual aligned AI, of course, would be a pivotal act.” What would an aligned AI do that would prevent anybody from ever building an unaligned AI?
I mostly agree with what you wrote. Preventing all unaligned AIs forever seems very difficult and cannot be guaranteed by soft influence and governance methods. These would only achieve a lower degree of reliability, perhaps constraining governments and corporations via access to compute and critical algorithms but remaining susceptible to bad actors who find loopholes in the system. I guess what I’m poking at is, does everyone here believe that the only way to prevent AI catastrophe is through power-grab pivotal acts that are way outside the Overton Window, e.g. burning all GPUs?
My guess is that it would implement universal surveillance and intervene, when necessary, to directly stop people from doing just that. Sorry, I should’ve been clearer that I was talking about an aligned superintelligent AI. Since unaligned AI killing everyone seems pretty obviously extremely bad according to the vast majority of humans’ preferences, preventing that would be a very high priority for any sufficiently powerful aligned AI.
Thanks, that really clarifies things. Frankly I’m not on board with any plan to “save the world” that calls for developing AGI in order to implement universal surveillance or otherwise take over the world. Global totalitarianism dictated by a small group of all-powerful individuals is just so terrible in expectation that I’d want to take my chances on other paths to AI safety.
I’m surprised that these kinds of pivotal acts are not more openly debated as a source of s-risk and x-risk. Publish your plans, open yourselves to critique, and perhaps you’ll revise your goals. If not, you’ll still be in a position to follow your original plan. Better yet, you might convince the eventual decision makers of it.
“It is necessary that people working on alignment have a capabilities lead.” Could you say a little more about this? Seems true but I’d be curious about your line of thought.
The theory of change could be as simple as “once we know how to build aligned AGI, we’ll tell everybody”, or as radical as “once we have an aligned AGI, we can steer the course of human events to prevent future catastrophe”. The more boring argument would be that any good ML research happens on the cutting edge of the field, so alignment needs big budgets and fancy labs just like any other researcher. Would you take a specific stance on which is most important?