we won’t simply get “only one chance to correctly align the first AGI”
We only get one chance for a “sufficiently critical try” which means an AI of the level of power where you lose control over the world if you failed to align it. I expect there are no claims to the effect that there will be only one chance to correctly align the first AGI.
A counterargument from no-FOOM should probably claim that there will never be such a “sufficiently critical try” at all because at every step of the way it would be possible to contain a failure of alignment at that step and try again and again until you succeed, as normal science and engineering always do.
I expect there are no claims to the effect that there will be only one chance to correctly align the first AGI.
For the purpose of my argument, there is no essential distinction between ‘the first AGI’ and ‘the first ASI’. My main point is to dispute the idea that there will be a special ‘it’ at all, which we need to align on our first and only try. I am rejecting the scenario where a single AI system suddenly takes over the world. Instead, I expect AI systems will continuously and gradually assume more control over the world over time. In my view, there will not be one decisive system, but rather a continuous process of AIs assuming greater control over time.
To understand the distinction I am making, consider the analogy of genetically engineering humans. By assumption, if the tech continues improving, there will eventually be a point where genetically engineered humans will be superhuman in all relevant respects compared to ordinary biological humans. They will be smarter, stronger, healthier, and more capable in every measurable way. Nonetheless, there is no special point at which we develop ‘the superhuman’. There is no singular ‘it’ to build, which then proceeds to take over the world in one swift action. Instead, genetically engineered humans would simply progressively get smarter, more capable, and more powerful across time as the technology improves. At each stage of technological innovation, these enhanced humans would gradually take over more responsibilities, command greater power in corporations and governments, and accumulate a greater share of global wealth. The transition would be continuous rather than discontinuous.
Yes, at some point such enhanced humans will possess the raw capability to take control over the world through force. They could theoretically coordinate to launch a sudden coup against existing institutions and seize power all at once. But the default scenario seems more likely: a continuous transition from ordinary human control over the world to superhuman genetically engineered control over the world. They would gradually occupy positions of power through normal economic and political processes rather than through sudden conquest.
For the purpose of my argument, there is no essential distinction between ‘the first AGI’ and ‘the first ASI’.
For the purpose of my response there is no essential distinction there either, except perhaps the book might be implicitly making use of the claim that building an ASI is certainly a “sufficiently critical try” (if something weaker isn’t already a “sufficiently critical try”), which makes the argument more confusing if left implicit, and poorly structured if used at all within that argument rather than outside of it.
The argument is still not that there is only one chance to align an ASI (this is a conclusion, not the argument for that conclusion). The argument is that there is only one chance to align the thing that constitutes a “sufficiently critical try”. A “sufficiently cricial try” is conceptually distict from “ASI”. The premise of the argument isn’t about a level of capability alone, but rather about lack of control over that level of capability.
One counterargument is to reject the premise and claim that even ASI won’t constitute a “sufficiently critical try” in this sense, that is even ASI won’t successfully take control over the world if misaligned. Probably because by the time it’s built there are enough checks and balances that it can’t (at least individually) take over the world if misaligned. And indeed this seems to be in line with the counterargument you are making. You don’t expect there will be lack of control, even as we reach ever higher levels of capability.
Nonetheless, there is no special point at which we develop ‘the superhuman’. There is no singular ‘it’ to build, which then proceeds to take over the world in one swift action.
Thus there is no “sufficiently critical try” here. But if there were, it would be a problem, because we would have to get it right the first time then. Since in your view there won’t be a “sufficiently critical try” at all, you reject the premise, which is fair enough.
Another counterargument would be to say that if we ever reach a “sufficiently critical try” (uncontainable lack of control over that level of capability if misaligned), by that time getting it right the first time won’t be as preposterous anymore as it is for the current humanity. Probably because with earlier AIs there will be a lot of more effective cognitive labor and institutions around to make it work.
We only get one chance for a “sufficiently critical try” which means an AI of the level of power where you lose control over the world if you failed to align it. I expect there are no claims to the effect that there will be only one chance to correctly align the first AGI.
A counterargument from no-FOOM should probably claim that there will never be such a “sufficiently critical try” at all because at every step of the way it would be possible to contain a failure of alignment at that step and try again and again until you succeed, as normal science and engineering always do.
For the purpose of my argument, there is no essential distinction between ‘the first AGI’ and ‘the first ASI’. My main point is to dispute the idea that there will be a special ‘it’ at all, which we need to align on our first and only try. I am rejecting the scenario where a single AI system suddenly takes over the world. Instead, I expect AI systems will continuously and gradually assume more control over the world over time. In my view, there will not be one decisive system, but rather a continuous process of AIs assuming greater control over time.
To understand the distinction I am making, consider the analogy of genetically engineering humans. By assumption, if the tech continues improving, there will eventually be a point where genetically engineered humans will be superhuman in all relevant respects compared to ordinary biological humans. They will be smarter, stronger, healthier, and more capable in every measurable way. Nonetheless, there is no special point at which we develop ‘the superhuman’. There is no singular ‘it’ to build, which then proceeds to take over the world in one swift action. Instead, genetically engineered humans would simply progressively get smarter, more capable, and more powerful across time as the technology improves. At each stage of technological innovation, these enhanced humans would gradually take over more responsibilities, command greater power in corporations and governments, and accumulate a greater share of global wealth. The transition would be continuous rather than discontinuous.
Yes, at some point such enhanced humans will possess the raw capability to take control over the world through force. They could theoretically coordinate to launch a sudden coup against existing institutions and seize power all at once. But the default scenario seems more likely: a continuous transition from ordinary human control over the world to superhuman genetically engineered control over the world. They would gradually occupy positions of power through normal economic and political processes rather than through sudden conquest.
For the purpose of my response there is no essential distinction there either, except perhaps the book might be implicitly making use of the claim that building an ASI is certainly a “sufficiently critical try” (if something weaker isn’t already a “sufficiently critical try”), which makes the argument more confusing if left implicit, and poorly structured if used at all within that argument rather than outside of it.
The argument is still not that there is only one chance to align an ASI (this is a conclusion, not the argument for that conclusion). The argument is that there is only one chance to align the thing that constitutes a “sufficiently critical try”. A “sufficiently cricial try” is conceptually distict from “ASI”. The premise of the argument isn’t about a level of capability alone, but rather about lack of control over that level of capability.
One counterargument is to reject the premise and claim that even ASI won’t constitute a “sufficiently critical try” in this sense, that is even ASI won’t successfully take control over the world if misaligned. Probably because by the time it’s built there are enough checks and balances that it can’t (at least individually) take over the world if misaligned. And indeed this seems to be in line with the counterargument you are making. You don’t expect there will be lack of control, even as we reach ever higher levels of capability.
Thus there is no “sufficiently critical try” here. But if there were, it would be a problem, because we would have to get it right the first time then. Since in your view there won’t be a “sufficiently critical try” at all, you reject the premise, which is fair enough.
Another counterargument would be to say that if we ever reach a “sufficiently critical try” (uncontainable lack of control over that level of capability if misaligned), by that time getting it right the first time won’t be as preposterous anymore as it is for the current humanity. Probably because with earlier AIs there will be a lot of more effective cognitive labor and institutions around to make it work.