Many of Paul Christiano’s writings were valuable corrections to the dominant Yudkowskian paradigm of AI safety. However, I think that many of them (especially papers like concrete problems in AI safety and posts likethese two) also ended up providing a lot of intellectual cover for people to do “AI safety” work (especially within AGI companies) that isn’t even trying to be scalable to much more powerful systems.
I do agree there is some risk of the type you describe, but mostly it does not match my practical experience so far.
The approach to “avoid using the term” makes little sense. There is a type difference between area of study (‘understanding power’) and dynamic (‘gradual disempowerment’). I don’t think you can substitute term for area of study for term for a dynamic or thread model, so avoiding using the term could be done mostly by either inventing another term for the the dynamic, or not thinking about the dynamic, or similar moves, which seem epistemically unhealthy.
In practical terms I don’t think there is much effort to “create a movement based around a class of threat models”. At least as authors of the GD paper, when trying to support thinking about the problems, we use understanding-directed labels/pointers (Post-AGI Civilizational Equilibria), even though in many ways it could have been easier to use GD as a brand.
”Understanding power” is fine as a label for part of your writing, but in my view is basically unusable as term for coordination.
Also, in practical terms, gradual disempowerment does not seem particularly convenient set of ideas for justifying that working in an AGI company on something very prosaic which helps the company is the best thing to do. There is often a funny coalition of people who prefer not thinking about the problem including radical Yudkowskians (“GD distracts from everyone being scared of dying with very high probability very soon”), people working on prosaic methods with optimistic views about both alignment and the labs (“GD distracts from efforts to make [the good company building the good AI] to win”) and people who would prefer if everything was just neat technical puzzle and there was not need to think about power distribution.
mostly it does not match my practical experience so far
I mostly wouldn’t expect it to at this point, FWIW. The people engaged right now are by and large people sincerely grappling with the idea, and particularly people who are already bought into takeover risk. Whereas one of the main mechanisms by which I expect misuse of the idea is that people who are uncomfortable with the concept of “AI takeover” can still classify themselves as part of the AI safety coalition when it suits them.
As an illustration of this happening to Paul’s worldview, see this Vox article titled “AI disaster won’t look like the Terminator. It’ll be creepier.” My sense is that both Paul and Vox wanted to distance themselves from Eliezer’s scenarios, and so Paul phrased his scenario in a way which downplayed stuff like “robot armies” and then Vox misinterpreted Paul to further downplay that stuff. (More on this from Carl here.) Another example: Sam Altman has previously justified racing to AGI by appealing to the idea that a slow takeoff is better than a fast takeoff.
Now, some of these dynamics are unavoidable—we shouldn’t stop debating takeoffs just because people might misuse the concepts. But it’s worth keeping an eye out for ideas that are particularly prone to this, and gradual disempowerment seems like one.
in practical terms, gradual disempowerment does not seem particularly convenient set of ideas for justifying that working in an AGI company on something very prosaic which helps the company is the best thing to do.
Well, it’s much more convenient than “AI takeover”, and so the question is how much people are motivated to use it to displace the AI takeover meme in their internal narratives.
when trying to support thinking about the problems, we use understanding-directed labels/pointers (Post-AGI Civilizational Equilibria), even though in many ways it could have been easier to use GD as a brand.
Kudos for doing so. I don’t mean to imply that you guys are unaware of this issue or negligent; IMO it’s a pretty hard problem to avoid. I agree that stuff like “understanding power” is nowhere near adequate as a replacement. However, I do think that there’s some concept like “empowering humans” which is a way to address both takeover risk and gradual disempowerment risk, if we fleshed it out into a proper research field. (Analogously, ambitious mechinterp is a way to address both fast take-off and slow take-off risks.) And so I expect that a cluster forming around something like human empowerment would be more productive and less prone to capture.
avoiding using the term could be done mostly by either inventing another term for the the dynamic, or not thinking about the dynamic, or similar moves, which seem epistemically unhealthy
Yeah, “avoid using it altogether” would be too strong. Maybe something more like “I’ll avoid using it as a headline/pointer to a cluster of people/ideas, and only use it to describe the specific threat model”.
Also, in practical terms, gradual disempowerment does not seem particularly convenient set of ideas for justifying that working in an AGI company on something very prosaic which helps the company is the best thing to do.
The bigger issue, as Jackson Wagner says is that there’s a very real risk that will be coopted by people who want to talk mostly about present-day harms of AI, and thus at best siphoning resources from actually useful work on gradual disempowerment threats/AI x-risk in general, and at worst creating polarization around gradual disempowerment with one party supporting gradual disempowerment of humans and another party opposing gradual disempowerment, while the anti-gradual disempowerment party/people become totally ineffective at dealing with the problem because it’s been taken over by omnicause dynamics.
Is your idea that “gradual disempowerment” isn’t a real problem or that it’s a distraction from actual issues? I’ve heard arguments for both, so I’m not sure what the details of your beliefs are. Personally, I see “gradual disempowerment” as a process that has already begun, but the main danger is AI deciding we should die, not humans living in comfort while all the real power is held by AI.
assorted varieties of gradual disempowerment do seem like genuine long term threats
however, by the nature of the idea, it involves talking a lot about relatively small present-day harms from AI
therefore gradual disempowerment is highly at-risk of being coopted by people who mostly just want to talk about present day harms, distracting from both AI x-risk overall and even perhaps from gradual-disempowerment-related x-risk
Many of Paul Christiano’s writings were valuable corrections to the dominant Yudkowskian paradigm of AI safety. However, I think that many of them (especially papers like concrete problems in AI safety and posts like these two) also ended up providing a lot of intellectual cover for people to do “AI safety” work (especially within AGI companies) that isn’t even trying to be scalable to much more powerful systems.
I want to register a prediction that “gradual disempowerment” will end up being (mis)used in a similar way. I don’t really know what to do about this, but I intend to avoid using the term myself. My own research on related topics I cluster under headings like “understanding intelligence”, “understanding political philosophy”, and “understanding power”. To me this kind of understanding-oriented approach seems more productive than trying to create a movement based around a class of threat models.
I do agree there is some risk of the type you describe, but mostly it does not match my practical experience so far.
The approach to “avoid using the term” makes little sense. There is a type difference between area of study (‘understanding power’) and dynamic (‘gradual disempowerment’). I don’t think you can substitute term for area of study for term for a dynamic or thread model, so avoiding using the term could be done mostly by either inventing another term for the the dynamic, or not thinking about the dynamic, or similar moves, which seem epistemically unhealthy.
In practical terms I don’t think there is much effort to “create a movement based around a class of threat models”. At least as authors of the GD paper, when trying to support thinking about the problems, we use understanding-directed labels/pointers (Post-AGI Civilizational Equilibria), even though in many ways it could have been easier to use GD as a brand.
”Understanding power” is fine as a label for part of your writing, but in my view is basically unusable as term for coordination.
Also, in practical terms, gradual disempowerment does not seem particularly convenient set of ideas for justifying that working in an AGI company on something very prosaic which helps the company is the best thing to do. There is often a funny coalition of people who prefer not thinking about the problem including radical Yudkowskians (“GD distracts from everyone being scared of dying with very high probability very soon”), people working on prosaic methods with optimistic views about both alignment and the labs (“GD distracts from efforts to make [the good company building the good AI] to win”) and people who would prefer if everything was just neat technical puzzle and there was not need to think about power distribution.
I mostly wouldn’t expect it to at this point, FWIW. The people engaged right now are by and large people sincerely grappling with the idea, and particularly people who are already bought into takeover risk. Whereas one of the main mechanisms by which I expect misuse of the idea is that people who are uncomfortable with the concept of “AI takeover” can still classify themselves as part of the AI safety coalition when it suits them.
As an illustration of this happening to Paul’s worldview, see this Vox article titled “AI disaster won’t look like the Terminator. It’ll be creepier.” My sense is that both Paul and Vox wanted to distance themselves from Eliezer’s scenarios, and so Paul phrased his scenario in a way which downplayed stuff like “robot armies” and then Vox misinterpreted Paul to further downplay that stuff. (More on this from Carl here.) Another example: Sam Altman has previously justified racing to AGI by appealing to the idea that a slow takeoff is better than a fast takeoff.
Now, some of these dynamics are unavoidable—we shouldn’t stop debating takeoffs just because people might misuse the concepts. But it’s worth keeping an eye out for ideas that are particularly prone to this, and gradual disempowerment seems like one.
Well, it’s much more convenient than “AI takeover”, and so the question is how much people are motivated to use it to displace the AI takeover meme in their internal narratives.
Kudos for doing so. I don’t mean to imply that you guys are unaware of this issue or negligent; IMO it’s a pretty hard problem to avoid. I agree that stuff like “understanding power” is nowhere near adequate as a replacement. However, I do think that there’s some concept like “empowering humans” which is a way to address both takeover risk and gradual disempowerment risk, if we fleshed it out into a proper research field. (Analogously, ambitious mechinterp is a way to address both fast take-off and slow take-off risks.) And so I expect that a cluster forming around something like human empowerment would be more productive and less prone to capture.
Yeah, “avoid using it altogether” would be too strong. Maybe something more like “I’ll avoid using it as a headline/pointer to a cluster of people/ideas, and only use it to describe the specific threat model”.
Rob Miles suggested ‘inexorable disempowerment’ as maybe better on a call where we discussed the default associations of ‘gradual’.
The bigger issue, as Jackson Wagner says is that there’s a very real risk that will be coopted by people who want to talk mostly about present-day harms of AI, and thus at best siphoning resources from actually useful work on gradual disempowerment threats/AI x-risk in general, and at worst creating polarization around gradual disempowerment with one party supporting gradual disempowerment of humans and another party opposing gradual disempowerment, while the anti-gradual disempowerment party/people become totally ineffective at dealing with the problem because it’s been taken over by omnicause dynamics.
Is your idea that “gradual disempowerment” isn’t a real problem or that it’s a distraction from actual issues? I’ve heard arguments for both, so I’m not sure what the details of your beliefs are. Personally, I see “gradual disempowerment” as a process that has already begun, but the main danger is AI deciding we should die, not humans living in comfort while all the real power is held by AI.
The impression I got from Ngo’s post is that:
assorted varieties of gradual disempowerment do seem like genuine long term threats
however, by the nature of the idea, it involves talking a lot about relatively small present-day harms from AI
therefore gradual disempowerment is highly at-risk of being coopted by people who mostly just want to talk about present day harms, distracting from both AI x-risk overall and even perhaps from gradual-disempowerment-related x-risk